esteveste / dreamerV2-pytorch

Pytorch implementation of DreamerV2: Mastering Atari with Discrete World Models, based on the original implementation
17 stars 4 forks source link

Regarding "actor learning" and dependencies #1

Closed dosssman closed 2 years ago

dosssman commented 2 years ago

Greetings.

Thank you very much for re-implementing the Dreamver-v2 in Pytorch and open sourcing it here.

I just wanted to ask for a few clarification regarding your implementation.

  1. At the bottom of the README file, it is mentioned that right now as implemented, doesn't support learning the actor by propagating the value network (torch.no_grad()) I plan to check a the code by myself, but just to be sure, if the actor is not learned by propagating through the value network, as I believe is done in the original implementation, how is the policy (agent / actor) learned at all ?

  2. While you have indeed mentioned the various dependencies used for this work, I would suggest to also include an export of the conda environment (as a .yml for example), as well as the pip freeze dependencies list, with the precise versions used. Otherwise, depending on when the conda install pytorch torchvision torchaudio cudatoolkit=11.1 -c pytorch -c nvidia is executed (say two years from now), it could lead to unexpected behavior, or just not work at all because of the mismatch between the version of the libraries.

Again, thank you very much for your time. Best regards.

esteveste commented 2 years ago

Hello! Thanks for your interest and questions!

The author's code has also changed already multiple times since I did this (I should probably put a tag on the original commit which was based as well 🤔), so some things might be slightly different now in the original repo. (which has been kind of hard for me as well in my experiments to keep a good comparison with the original paper results)

Even though in the dreamerV2 paper (https://arxiv.org/pdf/2010.02193.pdf) the author mentions using the dynamics backpropagation to learn the actor, in his most recent code it still only uses reinforce gradients if the actions are discrete. (https://github.com/danijar/dreamerv2/blob/3a711b42461b9942396f84ad3a63ec00f25faedb/dreamerv2/agent.py#L217, config file: https://github.com/danijar/dreamerv2/blob/3a711b42461b9942396f84ad3a63ec00f25faedb/dreamerv2/configs.yaml#L59).

Since I've been working mainly with discrete environments (example atari), I've actually put a bunch of .detach() in the imagination step to speed things up.

About your second point, you're right, it might be a good idea. I'm using this code for my own experiments, so it might be good to actually do a new commit to organize things. I will try once I have more free time.

dosssman commented 2 years ago

Hello again. Thank you very much for the fast answer.

Regarding the actor learning part, now that you mention it, it does make sense they use a different method to update the actor when the actions are discrete. I was mostly familiar with the Dreamer-v1 code and structure, hence my confusion.

Best wishes for your experiments.

dosssman commented 2 years ago

Hello again.

Here are some additional trouble I have run into while trying to run ta dreamer agents using the scripts in this repository. I hope this might be of use in case you are re-organizing the repository later.

  1. Missing dependencies: wandb, tensorboard, gym[atari], opencv-python
  2. After having manually installed the above dependencies and running the python dreamer/train.py ... commnad provided i nthte readme, it returns the following message:
    (dreamerV2-pytorch) d055@akira:~/random/rl/dreamerV2-pytorch$ python3 dreamerv2/train.py --logdir ./logdir/atari_pong/dreamerv2/1 --configs defaults atari --task atari_pong
    Logdir logdir/atari_pong/dreamerv2/1
    Create envs.
    /home/d055/anaconda3/envs/dreamerV2-pytorch/lib/python3.8/site-packages/gym/envs/atari/environment.py:69: UserWarning: WARN: obs_type "image" should be replaced with the image type, one of: rgb, grayscale
    logger.warn(
    A.L.E: Arcade Learning Environment (version +978d2ce)
    [Powered by Stella]
    Prefill dataset (50000 steps).
    Traceback (most recent call last):
    File "dreamerv2/train.py", line 152, in <module>
    train_driver(random_agent, steps=prefill, episodes=1)
    File "./common/driver.py", line 39, in __call__
    self._obs[i] = ob = self._envs[i].reset()
    File "./common/envs.py", line 300, in reset
    obs = self._env.reset()
    File "./common/envs.py", line 273, in reset
    obs = self._env.reset()
    File "./common/envs.py", line 184, in reset
    return self._env.reset()
    File "./common/envs.py", line 241, in reset
    return self._env.reset()
    File "./common/envs.py", line 119, in reset
    obs = {'image': image, 'ram': self._env.env._get_ram()}
    AttributeError: 'AtariEnv' object has no attribute '_get_ram'

I suspect this might be due to a change in the API of the gym wrappers, which do not have the _get_ram attribute the wrappers in common/envs seems to be relying upon.

Keep up the good work. Best regards.