Open chrisgao99 opened 1 month ago
Hi @chrisgao99, thanks for asking. The author updated the algorithm a few months ago. Before the changes, our repo was aligned with the original repo, now I have to check which changes he made. I will let you know as soon as possible.
Hello,
Previously, I got a good rl policy in a driving env using the original Dreamer v3 repo:
Recently, I've been trying to reproduce it with sheeprl dreamer v3 but the result got a lot worse. So I want to make sure that the config in sheeprl can correspond to the original dreamer config.
replay_ratio & train_ratio In sheeprl, replay_ratio = 0.5 means all the envs will collect 2 transitions before sampling one batch from the buffer to do training, right? In original dreamer v3, train ratio is 32 by default and it is used to calculate kwargs['samples_per_insert'], which is 0.5. Is this samples_per_insert equal to replay_ratio? The reference is the make_repaly() in the main.py:
buffer.size & replay.size Is the buffer size in sheeprl the same to the replay size in original dreamer v3?
model size In original dreamer v3, I use the model size200m :
In sheeprl the size parameters are different, for example dreamer_v3_XL:
May I ask how do they correspond to each other. My current understanding is deter ---> recurrent_state_size, hidden---> hidden_size, depth ---> cnn_channels_multiplier, units ---> dense units, classes ---> ?
fabric.devices This is purely a question for sheeprl fabric.devices. If I have two gpus,shall I set fabric.devices=2? I tried to set devices=2 and the fabric.world_size is still 1, so what does the world_size and devices mean?