Closed ghost closed 3 years ago
Update: no improvement in scores at 18M+ steps. (We have two separate training instances seeing this same result as well, so it's not isolated to a single system.)
You're right, it should have improved by now. Let's dig into this:
--precision 32
?atari_name_this_game
a try? Similar to Pong, it should learn pretty quickly.Great!
!git clone https://github.com/danijar/dreamerv2.git
!python3 dreamerv2/train.py --logdir '/content/drive/MyDrive/logdir/atari_pong/dreamerv2/1' --configs defaults atari --task atari_pong
If it's helpful, here's a copy of the notebook (very straightforward): https://colab.research.google.com/drive/1iB9G5fNnrxfWZfplynU70RMkjbKxSv_k?usp=sharing
Sounds good, let me know how it goes. I didn't see anything suspicious in your colab except that the section for the train_openl
image summary shows no images for me.
Looking at the episode length plot, it seems like the agent is learning something. Maybe it's really just taking a while to start making progress on Pong. Of course Pong can be solved much faster, but the hyper parameters were tuned to work well across all games at 200M steps, without focus on data-efficiency or easy games.
If the above ideas don't help find the problem, the next idea would be to train an agent at the first commit of the repository rather than after the refactoring. That said, I've tested the refactoring on Google machines and everything works fine there.
Early results (~2M steps) appear to be that the system is behaving more stably with --precision 32
flag. Will report back.
Update at 8M steps: despite some promising early behavior the agent has now settled into a zero-score, zero-hit behavior for the past ~4M steps:
Here is what I'm getting when training DreamerV2 on Pong 10 times (this uses mixed precision, so all flags at their defaults):
Yeah so there's definitely something going on. None of our training runs in Colab (up to 18M+ steps) achieved returns greater than about -19...
If you have a MuJoCo license, you could try running on a simple DMC task, e.g. dmc_walker_walk
to see if the general algorithm works for you in Colab.
I probably have the same problem when running on my own computer:
Latest commit from git. Running python3 dreamerv2/train.py --logdir logdir/atari_pong/dreamerv2/3 --configs defaults atari --task atari_pong
=> single run to 10M steps with eval/train_return -21; 2 runs to 2M also with return -21.
Initial commit from git. Running python3 dreamer.py --logdir logdir/atari_pong/dreamerv2/04_initial_commit --configs defaults atari --task atari_pong
and so far single run to 2M steps with return -7, short run but considering that from the latest commit returns never went above -19.
Quite few small runs so might be a random glitch, but it would seem that for some reason at least the latest commit (after the refactoring) won't train. The conda environment changed between those commits (because other one had tf 2.3 and the other 2.5). I might run some more trials next week.
Ok I at least made some progress here. I don't know if it's the full answer, but my agent is at least finally training (albeit slowly). Note this is Colab-specific:
Don't pip install
anything except ruamel.yaml
and elements
. For everything else use the Colab default installs. (You'll have to install the Atari ROMS too).
The issue appears to be noted here: Tensorflow versions in Colab. It looks like Colab uses a custom-compiled version of tensorflow, so doing !pip3 install tensorflow
can lead to poor-performing or non-functioning tf in Colab.
As noted above, my model is still training much slower than the results @danijar posted above (I've now trained to 10M frames with mean eval return of about -16). But this is the first time I've gotten the agent to escape -21 after many, many attempts. This suggests that the Colab tensorflow issue is a real one.
Again, this is still a far cry from positive Pong scores by 4M steps as shown in your plots above @danijar, but by using the built-in Colab installs my model finally at least appears to be learning.
Concretely, in Colab now my only imports are
!pip3 install ruamel.yaml
!pip3 install elements
# Install ROMs if necessary
!curl http://www.atarimania.com/roms/Roms.rar -O
!pip install unrar
!unrar x Roms.rar
!python -m atari_py.import_roms .
@holli That's good to know, thanks!
To both of you, if you could, it'd be great to know if the commit right before the refactoring commit still works for you (i.e. train at commit 1d4868f30).
@danijar yep that commit just before refactoring works well, trains to +15 after 4M steps. So before refactoring everything trains like in your https://github.com/danijar/dreamerv2/issues/8#issuecomment-849887724 example, but after refactoring nothing seems to train in my computer. Either the refactoring or some library change. All the stats in tensorboard seemed to start from similar points so not sure if those help.
I think I found the reason. Could you both retry with the current commit, please?
Yay, after a quick test it seems to train now.
What was the problem/fix?
Awesome!
It was a stupid mistake that sneaked in when I simplified the configs for the Github codebase. The default KL scale was defined as an integer so that the Atari config that sets it to 0.1 got floored to 0. This commit fixed it. I also updated elements
to raise an error instead of converting floats to ints so the same mistake doesn't happen in the future.
Sorry for the slow reply @danijar. I have also tested and verified that the new codebase works in Colab as well and learns as expected!
(note this was using Colab versions of tensorflow, etc, as noted above).
Thank you for your help!
[Edited to update to 18M steps; images below are from 12M]
Starting a new thread with more relevant detail here. Please feel free to close if you don't think it's appropriate.
We've now trained several instances to at least 10M+ steps with no improvement in Pong scores. This is using the default Pong settings on V100 machines in Colab Pro.
All training settings are the default in the repo, no modifications have been made to the code base as this was a first "test run" of dreamer.
Below are performance graphs. Happy to provide Colab copy or log files if it would be helpful. Would appreciate any insight, even if it's that we need to allow longer training (though the chart in Appendix F appears to show Pong improving by this point in training?).
Will keep training in the meantime and update if anything changes.
Thank you.
[Below images are from 12M steps. However issue persists beyond 18M+ steps]