PacktPublishing / Deep-Reinforcement-Learning-Hands-On

Hands-on Deep Reinforcement Learning, published by Packt
MIT License
2.84k stars 1.29k forks source link

How to Play with Rainbow #64

Closed cedros23 closed 4 years ago

cedros23 commented 4 years ago

Hello,

I have my custom gym env, I am able to train everything in the chapter 7, which are the variants of the DQN. Once, I train I can load the trained net and performance with the modified version of DQN_play script 03_dqn_play.

My problem is particular with Rainbow, how can I select action from the trained Rainbow net? Since it is distributional, it returns 50 predictions for my each action. So far, I tried to use max and mean Q values to test the agent but it performs a lot worse than the training performance.

Any help is appreciated, Thanks!

Shmuma commented 4 years ago

Hi!

Distributional net application should be done carefully, as the shape is unusual. The tricky thing here is that atoms' indices need to be converted into Q values.

Normally, finding the action with max Q value should work well, but I suggest to use method qvals() of the model to make sure you're applying max function to the proper tensor. This method is defined here: https://github.com/PacktPublishing/Deep-Reinforcement-Learning-Hands-On/blob/master/Chapter07/08_dqn_rainbow.py#L81

To simplify the task, you might want to use the ptan.agent.DQNAgent, the same way it was used in the training code. It supposed to be called with observations numpy array and returns an array with action indices: https://github.com/PacktPublishing/Deep-Reinforcement-Learning-Hands-On/blob/master/Chapter07/08_dqn_rainbow.py#L140

Another, slightly more advanced approach would be to get the distribution from the model using apply_softmax() method of the model and then sample from this distribution. This might help if underlying distribution in the environment is multi-modal. To sample, you might use methods np.random.sample or PyTorch distributions package.