How to Play with Rainbow

Hi!

Distributional net application should be done carefully, as the shape is unusual. The tricky thing here is that atoms' indices need to be converted into Q values.

Normally, finding the action with max Q value should work well, but I suggest to use method qvals() of the model to make sure you're applying max function to the proper tensor. This method is defined here: https://github.com/PacktPublishing/Deep-Reinforcement-Learning-Hands-On/blob/master/Chapter07/08_dqn_rainbow.py#L81

To simplify the task, you might want to use the ptan.agent.DQNAgent, the same way it was used in the training code. It supposed to be called with observations numpy array and returns an array with action indices: https://github.com/PacktPublishing/Deep-Reinforcement-Learning-Hands-On/blob/master/Chapter07/08_dqn_rainbow.py#L140

Another, slightly more advanced approach would be to get the distribution from the model using apply_softmax() method of the model and then sample from this distribution. This might help if underlying distribution in the environment is multi-modal. To sample, you might use methods np.random.sample or PyTorch distributions package.

PacktPublishing / Deep-Reinforcement-Learning-Hands-On

How to Play with Rainbow #64