google-deepmind / hanabi-learning-environment

hanabi_learning_environment is a research platform for Hanabi experiments.
Apache License 2.0
646 stars 149 forks source link

Issues running agents/rainbow/run_experiment.py (import fails and agent does not improve) #5

Open rocanaan opened 5 years ago

rocanaan commented 5 years ago

Hello,

I am trying to train the sample Rainbow agent by running the run_experiment.py script at hanabi-learning-environment/agents/rainbow, and I am having 2 issues:

1) The script doesn't run due to finding no module named rl_env. The problem seems to be that rl_env.py is in the root directory of the project, whereas the script I am trying to run is two level below it. I temporarily fixed it by adding Hanabi-learning-envirionment to my PYTHONPAH bash variable, but I believe a better fix might be to move the run_experiment script to the root (and change the necessary imports), which could be considered for a future version.

2) After temporarily fixing the issue above, running the script for around 15 hours (without changing any of the default parameters) showed no improvement of the agent (final Average per episode return: 0.22). Is this the expected result?

Thank you!

ghost commented 5 years ago

For point 2, try changing observation_type to be pyhanabi.AgentObservationType.SEER.value in rl_env.make(). This is a much easier problem, since the players can see their own cards. I think I was able to get a score of around 6.00 or so after 200 iterations, but you should definitely see it go above 1.00 pretty quickly.

I know this wasn't available when you asked your question, but I think it might be a good way of quickly testing your code now.

rocanaan commented 5 years ago

Thank you, I will try that and let you know how it goes.

Since opening this issue, I made a few changes to the setup in an attempt to make the experiment as close as possible to the one described on the paper: 1 ) On train.py, added agent type 'Rainbow' as a parameter to create_agent on line 82, as the default value for that is 'DQN'. This changed my score after 200 iterations from below 1.0 to ~3.5 and I believe is the correct way to launch that experiment 2) On rainbow_agent.py, changed 'num_layers' from 1 to 2, as the paper states the neural network used had 2 hidden layers, not 1 3) On run_experiment.py, changed 'num_iterations' from 200 to 20k, as my understanding from Marc's blog post is that the agent is supposed to achieve performance around 12 points which about 10 million steps. The default setup is 200 iterations 5000 time steps per iterations = 1M time steps, so the training time is short by an order of magnitude. 4) Changed 'tf_device' from '/cpu:' to '/gpu:*' in order to run on GPU

I am currently in the process of running the agents with those changes, and it reports a score of about 5.5 after 1000 iterations (or 5M steps), which seems approximately in line with the results from the blog post.

My main problem right now is that I seem to be running way slower than expected. While Marc reports almost 20 million steps on the first day, my 1000 iterations (5M steps) have taken me about 4 days. The logger reports around 150~200 steps per second. Is this to be expected?

I am running on a MacBook Pro with graphics card Intel HD Graphics 630 1536 MB

Thank you very much!

mgbellemare commented 5 years ago

Hi @rocanaan. If you're running at 150 steps per second, that should give you 5M steps in about 9 hours, unless my math is wrong. Is it possible that nothing is happening while your Mac is sleeping?

rocanaan commented 5 years ago

Hello, @mgbellemare , thanks for the response!

You are correct, and verifying it showed me the true problem: by default, we are checkpointing every iteration, which takes ~250 seconds. Considering that 5000 steps at 150/second is takes ~33 seconds, checkpointing every iteration corresponds to a factor of almost 10 of slowdown. I need to drastically increase the iteration length in order to spend a larger fraction of time actually training.

Point 1 of the original post (the fact that the current directory structure doesn't actually allow the training to happen doing to imports not working) is still standing, and I think it is pretty important to address (either by changing the directory structure so it works, or providing a fix on the readme), so I think this issue could remain open until then.

I also need a bit of help with issue #9, as I am currently unable to use checkpoints from one training session to kickstart another session.