Eclectic-Sheep / sheeprl

Distributed Reinforcement Learning accelerated by Lightning Fabric
https://eclecticsheep.ai
Apache License 2.0
305 stars 31 forks source link

Dreamer-v3 exp yaml file for vector gymnasium Env and evaluation from checkpoint? #122

Closed hanshuo-shuo closed 11 months ago

hanshuo-shuo commented 11 months ago

Hi, Thanks for this great library with all the sota model-based method,

I'm trying to use dreamer-v3 on my customed environment with only the vector observation. I find it hard to define the experiment file with only mlp-encoder.

Do you have an example yaml file on dreamer-v3 with only vector observation?

Right now my yaml file looks like

defaults:
  - override /algo: dreamer_v3
  - override /env: gym
  - _self_

wrapper:
  _target_: gymnasium.make
  id: ${env.id}
  render_mode: None
  from_vectors: True
  max_episode_steps : 300

# Experiment
total_steps: 100000
per_rank_batch_size: 16
per_rank_sequence_length: 64
# Checkpoint
checkpoint:
  every: 100000

# Buffer
buffer:
  size: 1000000
  checkpoint: False

# Distribution
distribution:
  type: "auto"

mlp_keys:
  encoder:
    - state

# Algorithm
algo:
  learning_starts: 5000
  train_every: 50
  mlp_layers: 2
  world_model:
    recurrent_model:
      recurrent_state_size: 256
    transition_model:
      hidden_size: 256
    representation_model:
      hidden_size: 256

Although this can run, but I'm not sure this is the correct way to do so.

Thank you

hanshuo-shuo commented 11 months ago

And do we have an easy way to implement evaluation through Sheeprl? I try to modify the test() function, but somehow find it hard to modify LOL

gonultasbu commented 11 months ago

@hanshuo-shuo have you been able to find a practical method for evaluation? I have been going through the repository with some prior stable-baselines3 experience but it looks like the train and eval loops (as they exist in sb3) are abstracted through config files, which I have not been able to decipher myself yet.

hanshuo-shuo commented 11 months ago

@hanshuo-shuo have you been able to find a practical method for evaluation? I have been going through the repository with some prior stable-baselines3 experience but it looks like the train and eval loops (as they exist in sb3) are abstracted through config files, which I have not been able to decipher myself yet.

yes, and you can check the newest issue, the author sends me the code script for eval. And you can also check my forked sheeprl, I have an eval file. https://github.com/hanshuo-shuo/sheeprl_prey

belerico commented 11 months ago

Hi guys, can you try out the new main version? We have introduced the evaluation script for every algorithm: you specify the checkpoint path and it starts the evaluation of the agent given the checkpoint. The only requirements is the folder structure where the checkpoint is placed: it must follow our standard hydra-based folder structure, i.e.:

logs
└── runs
    └── sac
        └── LunarLanderContinuous-v2
            └── 2023-10-31_12-26-27_default_42
                ├── .hydra
                └── version_0
                    ├── checkpoint
                    ├── evaluation
                    │   └── version_0
                    │       └── test_videos
                    ├── memmap_buffer
                    │   ├── rank_0
                    │   └── rank_1
                    └── train_videos

That's because we need to get the .hydra folder to reload the old configuration of the experiment

gonultasbu commented 11 months ago

@hanshuo-shuo much appreciated! The eval.py script looks like it is what I described.

@belerico will do. Thank you for the prompt response!

belerico commented 11 months ago

I'm fixing an issue related to the evaluation. I'm reopening so that we can discuss in the meantime

belerico commented 11 months ago

Try this branch instead: https://github.com/Eclectic-Sheep/sheeprl/tree/fix/evaluate-agents

belerico commented 11 months ago

Feel free to reopen it if there's any trouble :metal: