Closed LYK-love closed 4 months ago
Hi @LYK-love, thanks for reporting. Can you please try this command?
python sheeprl.py exp=dreamer_v3 env=mujoco env.id=Walker2d-v4 algo.cnn_keys.encoder=[rgb] env.max_episode_steps=1000 +env.wrapper.terminate_when_unhealthy=False
I am trying to understand if your problem is related to the video recording or if it is a short-lived episode that is recorded. Indeed, differently from dmc, the gymnasium's Walker Walk environment has different termination conditions (see https://gymnasium.farama.org/environments/mujoco/walker2d/#episode-end) and, at the beginning of the training, the recorded episode might last less than 1 second.
How long did you train the agent? And what is the reward you obtained?
Hi, I tried your command and got a bunch of videos with duration > 20s. It seems fine. The training just started 30s.
However, I ran the problematic command, which I reported in Github, for 10+ hours. I still got the same result (videos with duration=0). Can you reproduce my problem?
On Jan 19, 2024, at 1:20 AM, Michele Milesi @.***> wrote:
Hi @LYK-love https://github.com/LYK-love, thanks for reporting. Can you please try this command?
python sheeprl.py exp=dreamer_v3 env=mujoco env.id=Walker2d-v4 algo.cnn_keys.encoder=[rgb] env.max_episode_steps=1000 +env.wrapper.terminate_when_unhealthy=False I am trying to understand if your problem is related to the video recording or if it is a short-lived episode that is recorded. Indeed, differently from dmc, the gymnasium's Walker Walk environment has different termination conditions (see https://gymnasium.farama.org/environments/mujoco/walker2d/#episode-end) and, at the beginning of the training, the recorded episode might last less than 1 second.
How long did you train the agent? And what is the reward you obtained?
— Reply to this email directly, view it on GitHub https://github.com/Eclectic-Sheep/sheeprl/issues/191#issuecomment-1900041684, or unsubscribe https://github.com/notifications/unsubscribe-auth/ARVGVADUKZBBK7SJLPEKHLTYPI3HJAVCNFSM6AAAAABCBH4AV6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMBQGA2DCNRYGQ. You are receiving this because you were mentioned.
Yeah, I will try to reproduce your error.
Have any configuration files been modified? This way I can reproduce the experiment exactly with the same configuration.
If any configuration file has been modified, can you please share the config.yaml
file in the logging directory?
In the meantime, can you share the loss and reward charts?
No I didn't modify any config. The content of sheeprl/configs/config.yaml
is:
# @package _global_
# Specify here the default training configuration
defaults:
- _self_
- algo: default.yaml
- buffer: default.yaml
- checkpoint: default.yaml
- distribution: default.yaml
- env: default.yaml
- fabric: default.yaml
- metric: default.yaml
- model_manager: default.yaml
- hydra: default.yaml
- exp: ???
num_threads: 1
# Set it to True to run a single optimization step
dry_run: False
# Reproducibility
seed: 42
torch_deterministic: False
# Output folders
exp_name: ${algo.name}_${env.id}
run_name: ${now:%Y-%m-%d_%H-%M-%S}_${exp_name}_${seed}
root_dir: ${algo.name}/${env.id}
I tried the command:
python sheeprl.py exp=dreamer_v3 env=mujoco env.id=Walker2d-v4 algo.cnn_keys.encoder=[rgb]
again today and this still gave me videos with duratio n=0. At this time I only executed it for 10 minutes. There's no loss and checkpoint. I believe that's because the traning time is too short.
I'll give you more results when I train for more hours. But you can reproduce freely.
Thanks, I will try to replicate it and let you know as soon as possible.
At this time I only executed it for 10 minutes. There's no loss and checkpoint. I believe that's because the training time is too short.
Yeah, the losses and checkpoints are missing because by default the training starts after ~65k steps.
Hi @LYK-love, I have done some experiments.
As I thought, the Walker2d-V4
Gymnasium environment with the condition terminate_when_unhealthy=True
has too short episodes, and Dreamer cannot learn (the random steps are so short that the agent cannot figure out which actions to take to achieve optimal behaviour).
I recommend using the environment with terminate_when_unhealthy=False
(see attached screen) or using the walker_walk environment provided by DMC, which is the one used in various papers for benchmarks. You can start a DreamerV3 training in the DMC walker_walk environment by following this guide.
In this picture, you can see the reward obtained by DreamerV3 in the Walker2d-v4
environment provided by Gymnasium. The red line is with terminate_when_unhealthy=True
, and the light blue one is with terminate_when_unhealthy=False
.
My env:
MUJOCO_GL=egl
and installedegl
.3.9
.nvidia-smi
).0.5.2
. I've rungit pull
before my trying.I train my agent in in the walker walk environment via:
This will start training the model and keep printing some logs to the CLI. However, at any given moment, the duration of the video files in dir
logs/runs/dreamer_v3/Walker2d-v4/2024-01-18_21-15-29_dreamer_v3_Walker2d-v4_42/version_0/train_videos
is always 0s. When I evaluate my checkpoint with command:The generated test videos under
logs/runs/dreamer_v3/Walker2d-v4/2024-01-18_21-15-29_dreamer_v3_Walker2d-v4_42/version_0/evaluation/version_0/train_videos
also have duration=0s.This only happens to MuJoCo Gymnasium env
Walker2d-v4
. I believe it is unrelavant to dreamer algorithm. When I run:or
The problem holds.
But if I switch to other envs, the video duration seems to be normal. For instance, the video duration = 33s with algorithm=dreamer_v3 and env=
walker_walk
:The video duration = 33s with algorithm=dreamer_v1 and env=
walker_walk
:The video duration = 10swith algorithm=ppo and env=
CartPole-v1
The generated training and testing videos always have duration = 10s.