Eclectic-Sheep / sheeprl

Distributed Reinforcement Learning accelerated by Lightning Fabric
https://eclecticsheep.ai
Apache License 2.0
275 stars 26 forks source link

[Error] The trainning and testing videos for MuJoCo Gymnasium always have durations= 0s #191

Closed LYK-love closed 4 months ago

LYK-love commented 5 months ago

My env:

I train my agent in in the walker walk environment via:

python sheeprl.py exp=dreamer_v3 env=mujoco env.id=Walker2d-v4 algo.cnn_keys.encoder=[rgb]

This will start training the model and keep printing some logs to the CLI. However, at any given moment, the duration of the video files in dir logs/runs/dreamer_v3/Walker2d-v4/2024-01-18_21-15-29_dreamer_v3_Walker2d-v4_42/version_0/train_videos is always 0s. When I evaluate my checkpoint with command:

python sheeprl_eval.py checkpoint_path=/path/to/checkpoint.ckpt fabric.accelerator=gpu env.capture_video=True

The generated test videos under logs/runs/dreamer_v3/Walker2d-v4/2024-01-18_21-15-29_dreamer_v3_Walker2d-v4_42/version_0/evaluation/version_0/train_videos also have duration=0s.

This only happens to MuJoCo Gymnasium env Walker2d-v4. I believe it is unrelavant to dreamer algorithm. When I run:

python sheeprl.py exp=dreamer_v1 env=mujoco env.id=Walker2d-v4 algo.cnn_keys.encoder=[rgb]

or

python sheeprl.py exp=dreamer_v2 env=mujoco env.id=Walker2d-v4 algo.cnn_keys.encoder=[rgb]

The problem holds.

But if I switch to other envs, the video duration seems to be normal. For instance, the video duration = 33s with algorithm=dreamer_v3 and env=walker_walk:

python sheeprl.py exp=dreamer_v3 env=dmc env.id=walker_walk algo.cnn_keys.encoder=[rgb]

The video duration = 33s with algorithm=dreamer_v1 and env=walker_walk:

python sheeprl.py exp=dreamer_v1 env=dmc env.id=walker_walk algo.cnn_keys.encoder=[rgb]

The video duration = 10swith algorithm=ppo and env=CartPole-v1

python sheeprl.py fabric.accelerator=cpu fabric.strategy=ddp fabric.devices=2 exp=ppo env=gym env.id=CartPole-v1

The generated training and testing videos always have duration = 10s.

michele-milesi commented 5 months ago

Hi @LYK-love, thanks for reporting. Can you please try this command?

python sheeprl.py exp=dreamer_v3 env=mujoco env.id=Walker2d-v4 algo.cnn_keys.encoder=[rgb] env.max_episode_steps=1000 +env.wrapper.terminate_when_unhealthy=False

I am trying to understand if your problem is related to the video recording or if it is a short-lived episode that is recorded. Indeed, differently from dmc, the gymnasium's Walker Walk environment has different termination conditions (see https://gymnasium.farama.org/environments/mujoco/walker2d/#episode-end) and, at the beginning of the training, the recorded episode might last less than 1 second.

How long did you train the agent? And what is the reward you obtained?

LYK-love commented 5 months ago

Hi, I tried your command and got a bunch of videos with duration > 20s. It seems fine. The training just started 30s.

However, I ran the problematic command, which I reported in Github, for 10+ hours. I still got the same result (videos with duration=0). Can you reproduce my problem?

On Jan 19, 2024, at 1:20 AM, Michele Milesi @.***> wrote:

Hi @LYK-love https://github.com/LYK-love, thanks for reporting. Can you please try this command?

python sheeprl.py exp=dreamer_v3 env=mujoco env.id=Walker2d-v4 algo.cnn_keys.encoder=[rgb] env.max_episode_steps=1000 +env.wrapper.terminate_when_unhealthy=False I am trying to understand if your problem is related to the video recording or if it is a short-lived episode that is recorded. Indeed, differently from dmc, the gymnasium's Walker Walk environment has different termination conditions (see https://gymnasium.farama.org/environments/mujoco/walker2d/#episode-end) and, at the beginning of the training, the recorded episode might last less than 1 second.

How long did you train the agent? And what is the reward you obtained?

— Reply to this email directly, view it on GitHub https://github.com/Eclectic-Sheep/sheeprl/issues/191#issuecomment-1900041684, or unsubscribe https://github.com/notifications/unsubscribe-auth/ARVGVADUKZBBK7SJLPEKHLTYPI3HJAVCNFSM6AAAAABCBH4AV6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMBQGA2DCNRYGQ. You are receiving this because you were mentioned.

michele-milesi commented 5 months ago

Yeah, I will try to reproduce your error. Have any configuration files been modified? This way I can reproduce the experiment exactly with the same configuration. If any configuration file has been modified, can you please share the config.yaml file in the logging directory?

In the meantime, can you share the loss and reward charts?

LYK-love commented 5 months ago

No I didn't modify any config. The content of sheeprl/configs/config.yaml is:

# @package _global_

# Specify here the default training configuration
defaults:
  - _self_
  - algo: default.yaml
  - buffer: default.yaml
  - checkpoint: default.yaml
  - distribution: default.yaml
  - env: default.yaml
  - fabric: default.yaml
  - metric: default.yaml
  - model_manager: default.yaml
  - hydra: default.yaml
  - exp: ???

num_threads: 1

# Set it to True to run a single optimization step
dry_run: False

# Reproducibility
seed: 42
torch_deterministic: False

# Output folders
exp_name: ${algo.name}_${env.id}
run_name: ${now:%Y-%m-%d_%H-%M-%S}_${exp_name}_${seed}
root_dir: ${algo.name}/${env.id}

I tried the command:

python sheeprl.py exp=dreamer_v3 env=mujoco env.id=Walker2d-v4 algo.cnn_keys.encoder=[rgb]

again today and this still gave me videos with duratio n=0. At this time I only executed it for 10 minutes. There's no loss and checkpoint. I believe that's because the traning time is too short.

image image image image

I'll give you more results when I train for more hours. But you can reproduce freely.

michele-milesi commented 5 months ago

Thanks, I will try to replicate it and let you know as soon as possible.

At this time I only executed it for 10 minutes. There's no loss and checkpoint. I believe that's because the training time is too short.

Yeah, the losses and checkpoints are missing because by default the training starts after ~65k steps.

michele-milesi commented 5 months ago

Hi @LYK-love, I have done some experiments. As I thought, the Walker2d-V4 Gymnasium environment with the condition terminate_when_unhealthy=True has too short episodes, and Dreamer cannot learn (the random steps are so short that the agent cannot figure out which actions to take to achieve optimal behaviour). I recommend using the environment with terminate_when_unhealthy=False (see attached screen) or using the walker_walk environment provided by DMC, which is the one used in various papers for benchmarks. You can start a DreamerV3 training in the DMC walker_walk environment by following this guide.

ww reward

In this picture, you can see the reward obtained by DreamerV3 in the Walker2d-v4 environment provided by Gymnasium. The red line is with terminate_when_unhealthy=True, and the light blue one is with terminate_when_unhealthy=False.