Closed dosssman closed 2 years ago
Hello! Thanks for your interest and questions!
The author's code has also changed already multiple times since I did this (I should probably put a tag on the original commit which was based as well 🤔), so some things might be slightly different now in the original repo. (which has been kind of hard for me as well in my experiments to keep a good comparison with the original paper results)
Even though in the dreamerV2 paper (https://arxiv.org/pdf/2010.02193.pdf) the author mentions using the dynamics backpropagation to learn the actor, in his most recent code it still only uses reinforce gradients if the actions are discrete. (https://github.com/danijar/dreamerv2/blob/3a711b42461b9942396f84ad3a63ec00f25faedb/dreamerv2/agent.py#L217, config file: https://github.com/danijar/dreamerv2/blob/3a711b42461b9942396f84ad3a63ec00f25faedb/dreamerv2/configs.yaml#L59).
Since I've been working mainly with discrete environments (example atari), I've actually put a bunch of .detach()
in the imagination step to speed things up.
About your second point, you're right, it might be a good idea. I'm using this code for my own experiments, so it might be good to actually do a new commit to organize things. I will try once I have more free time.
Hello again. Thank you very much for the fast answer.
Regarding the actor learning part, now that you mention it, it does make sense they use a different method to update the actor when the actions are discrete. I was mostly familiar with the Dreamer-v1 code and structure, hence my confusion.
Best wishes for your experiments.
Hello again.
Here are some additional trouble I have run into while trying to run ta dreamer agents using the scripts in this repository. I hope this might be of use in case you are re-organizing the repository later.
wandb
, tensorboard
, gym[atari]
, opencv-python
python dreamer/train.py ...
commnad provided i nthte readme, it returns the following message:
(dreamerV2-pytorch) d055@akira:~/random/rl/dreamerV2-pytorch$ python3 dreamerv2/train.py --logdir ./logdir/atari_pong/dreamerv2/1 --configs defaults atari --task atari_pong
Logdir logdir/atari_pong/dreamerv2/1
Create envs.
/home/d055/anaconda3/envs/dreamerV2-pytorch/lib/python3.8/site-packages/gym/envs/atari/environment.py:69: UserWarning: WARN: obs_type "image" should be replaced with the image type, one of: rgb, grayscale
logger.warn(
A.L.E: Arcade Learning Environment (version +978d2ce)
[Powered by Stella]
Prefill dataset (50000 steps).
Traceback (most recent call last):
File "dreamerv2/train.py", line 152, in <module>
train_driver(random_agent, steps=prefill, episodes=1)
File "./common/driver.py", line 39, in __call__
self._obs[i] = ob = self._envs[i].reset()
File "./common/envs.py", line 300, in reset
obs = self._env.reset()
File "./common/envs.py", line 273, in reset
obs = self._env.reset()
File "./common/envs.py", line 184, in reset
return self._env.reset()
File "./common/envs.py", line 241, in reset
return self._env.reset()
File "./common/envs.py", line 119, in reset
obs = {'image': image, 'ram': self._env.env._get_ram()}
AttributeError: 'AtariEnv' object has no attribute '_get_ram'
I suspect this might be due to a change in the API of the gym wrappers, which do not have the _get_ram
attribute the wrappers in common/envs
seems to be relying upon.
Keep up the good work. Best regards.
Greetings.
Thank you very much for re-implementing the Dreamver-v2 in Pytorch and open sourcing it here.
I just wanted to ask for a few clarification regarding your implementation.
At the bottom of the README file, it is mentioned that
right now as implemented, doesn't support learning the actor by propagating the value network (torch.no_grad())
I plan to check a the code by myself, but just to be sure, if the actor is not learned by propagating through the value network, as I believe is done in the original implementation, how is the policy (agent / actor) learned at all ?While you have indeed mentioned the various dependencies used for this work, I would suggest to also include an export of the conda environment (as a .yml for example), as well as the
pip freeze
dependencies list, with the precise versions used. Otherwise, depending on when theconda install pytorch torchvision torchaudio cudatoolkit=11.1 -c pytorch -c nvidia
is executed (say two years from now), it could lead to unexpected behavior, or just not work at all because of the mismatch between the version of the libraries.Again, thank you very much for your time. Best regards.