Closed cedros23 closed 5 years ago
Hi!
Distributional net application should be done carefully, as the shape is unusual. The tricky thing here is that atoms' indices need to be converted into Q values.
Normally, finding the action with max Q value should work well, but I suggest to use method qvals()
of the model to make sure you're applying max function to the proper tensor. This method is defined here: https://github.com/PacktPublishing/Deep-Reinforcement-Learning-Hands-On/blob/master/Chapter07/08_dqn_rainbow.py#L81
To simplify the task, you might want to use the ptan.agent.DQNAgent
, the same way it was used in the training code. It supposed to be called with observations numpy array and returns an array with action indices: https://github.com/PacktPublishing/Deep-Reinforcement-Learning-Hands-On/blob/master/Chapter07/08_dqn_rainbow.py#L140
Another, slightly more advanced approach would be to get the distribution from the model using apply_softmax()
method of the model and then sample from this distribution. This might help if underlying distribution in the environment is multi-modal. To sample, you might use methods np.random.sample
or PyTorch distributions package.
Hello,
I have my custom gym env, I am able to train everything in the chapter 7, which are the variants of the DQN. Once, I train I can load the trained net and performance with the modified version of DQN_play script 03_dqn_play.
My problem is particular with Rainbow, how can I select action from the trained Rainbow net? Since it is distributional, it returns 50 predictions for my each action. So far, I tried to use max and mean Q values to test the agent but it performs a lot worse than the training performance.
Any help is appreciated, Thanks!