PacktPublishing / Hands-On-Intelligent-Agents-with-OpenAI-Gym

Code for Hands On Intelligent Agents with OpenAI Gym book to get started and learn to build deep reinforcement learning agents using PyTorch
https://www.packtpub.com/big-data-and-business-intelligence/hands-intelligent-agents-openai-gym
MIT License
371 stars 149 forks source link

Best episode reward didn't save #29

Closed Jidenna closed 4 years ago

Jidenna commented 4 years ago

So I trained the asyn_a2c model for 1.6M steps and obtained a best reward of 1.105. After ending the training, i tried to run the --test and it says my best reward from the model was now 0.036. Also I tried continuing the training and I am still getting a lower best reward of 0.036 instead of 1.105. What am I missing?

praveen-palanisamy commented 4 years ago

Hi @Jidenna ,

What is the value of "save_freq_when_perf_improves" parameter in the async_a2c_parameters.json that you are using?

The performance of the agent in terms of the reward obtained is tracked and when the performance is consistently better than the agent's previous best, the agent's model is saved. The "save_freq_when_perf_improves" parameter in the async_a2c_parameters.json is used to configure how the consistency is defined in terms of the number of episodes with improved performance.

In your case, it may have happened that the agent obtained a best reward of 1.105 for one or more episodes but not consistently for "save_freq_when_perf_improves" number of episodes and therefore the agent's best model state might not have been saved.

You can tweak the value of "save_freq_when_perf_improves" depending on your training environment to change the behavior of when the agent's model is saved. For example, if you change the value in the following line: https://github.com/PacktPublishing/Hands-On-Intelligent-Agents-with-OpenAI-Gym/blob/31026aaee497ef35c96b3a7f1aed5809361e328b/ch8/async_a2c_parameters.json#L13

to 1, the agent's best model state will be saved as soon as the agent performs at its best for one episode consistently.

Once it is saved, testing the agent with --test or continuing the training should work as you expect.

Jidenna commented 4 years ago

Thanks for the prompt response. The value of the save_freq_when_perf_improves is currently 10. If I change it to 1 like you suggested, will it have any effect on training the model? is value of 10 the optimal solution to a better policy?

praveen-palanisamy commented 4 years ago

The "optimal" value for the save_freq_when_perf_improves hyperparameter depends on the nature of environment (deterministic vs stochastic). In stochastic environments (majority of the environments are stochastic) it is typically a good idea to save the model when the agent is consistently performing better for say, 10 episodes in succession as that is somewhat a good indicator of improvement. With a value of 1, it's hard to tell if the agent actually improved or if it was lucky in that particular episode.

If the learning task in your chosen environment is hard, you can try using a smaller value say, 5 instead of the default value of 10. For most the environments discussed in the book, 10 was a good value for save_freq_when_perf_improves.

Jidenna commented 4 years ago

Got it thanks. I last question, is there any way I can tell the agent has performed better for say 10 episodes. Is this something I can see fro the average reward or there is another way to print the 10 consecutive best reward.

I ask this because the last I trained for over 1.6 mil steps and got a best reward of 1.105, I ended the training without knowing if that has gone on for the 10th time. Would be nice to know for sure the frequency of the said best reward before ending the training

praveen-palanisamy commented 4 years ago

Yes there is! The async_a2c.py training script (like any other agent training script in this repository) will generate TensorBoard logs in the logs directory (./logs/AsyncA2C_*). You can simply run tensorboard like this: tensorboard --logdir=./logs to launch tensorboard which should show you several plots (like the one shown in the book) that you might need including the episodic reward plot which will tell you the maximum reward reached by your agent among other info.

The episodic and best rewards are also printed onto the console (stdout) which you can store into a file to inspect.

Hope that answered your questions. Feel free to close this issue once resolved.

Jidenna commented 4 years ago

Gotcha. Thanks a lot.