PacktPublishing / Deep-Reinforcement-Learning-Hands-On

Hands-on Deep Reinforcement Learning, published by Packt
MIT License
2.83k stars 1.28k forks source link

How to reload the models in Chapter07? #10

Closed javabean68 closed 5 years ago

javabean68 commented 5 years ago

Hi Maxim,

I would find really useful in Chapter07 something like 03_dqn_play.py in Chapter06. I modified the RewardTracker in order to save the best model:

if self.best_mean_reward is None or self.best_mean_reward < mean_reward: ----torch.save(self.net.state_dict(), "best.dat") ----if self.best_mean_reward is not None: --------print("Best mean reward updated %.3f -> %.3f, model saved" % (self.best_mean_reward, mean_reward)) --------self.best_mean_reward = mean_reward

and it seems to work. But I have problems to reload the net with the saved weights and to work with it...I tried with something as in the code attached...

09_dqn_play.py.txt

I have doubt it is right :-( Do you have perhaps already a piece of code at hand or could you give me a tip?

Thank you very much in advance! Regards Fabio

Shmuma commented 5 years ago

Hi Fabio!

In attachment you've sent, there is no code for loading the model from file, only comment about it. In fact, load saved model in not complicated, you just need to do this: net.load_state_dict(torch.load("best.dat", map_location=lambda storage, loc: storage))

Argument map_location is needed only if you've trained the model on GPU, but trying to load it without CUDA initialized. Otherwise, it could be omitted.

javabean68 commented 5 years ago

Hi Maxim

thank you! The rest of my approach is right? What a little baffles me is to load at the begin 32 random obs to make the batch consistent with what the net waits

Regards Fabio

Am Sa., 3. Nov. 2018 um 17:31 Uhr schrieb Max Lapan < notifications@github.com>:

Hi Fabio!

In attachment you've sent, there is no code for loading the model from file, only comment about it. In fact, load saved model in not complicated, you just need to do this: net.load_state_dict(torch.load("best.dat", map_location=lambda storage, loc: storage))

Argument map_location is needed only if you've trained the model on GPU, but trying to load it without CUDA initialized. Otherwise, it could be omitted.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/PacktPublishing/Deep-Reinforcement-Learning-Hands-On/issues/10#issuecomment-435601450, or mute the thread https://github.com/notifications/unsubscribe-auth/AYHzX61g2AA9z9t7jhjJIx0rotFXaEcEks5urcTSgaJpZM4YIgTc .

Shmuma commented 5 years ago

Sorry, didn't got this from your first message. No, the approach is wrong, what you need is much simpler. In fact, you don't need observation buffer at all :).

To apply your network, you don't need to pass it the same batch of samples as you've used for training. First dimension of input value to the network could be of any size, for example, you can train network using batches of 64 samples and for every net application it will return you the output with 64 entries. But if you want to apply the network to different batch, say with 2 samples, you can just pass it the tensor with size=2 of the first dimension and it will return you the output of size 2.

So, if you have only one observation, just pass it to the network with batch=1, exactly this happens here (line 42): https://github.com/PacktPublishing/Deep-Reinforcement-Learning-Hands-On/blob/master/Chapter06/03_dqn_play.py#L42

We take our state (the array of 4x84x84) and convert it into tensor. But notice square brackets around the state variable. It automatically adds the extra dimension, so, resulting tensor is 1x4x84x84, so, our batch dimension is 1.

The output from the network will also have first dimension of 1, that's why on line 43 we need to access zeroth element of the output to obtain our q-values.

javabean68 commented 5 years ago

Hi Max,

Thank you very much! I'll follow your advice.

Regards Fabio

Max Lapan notifications@github.com schrieb am So. 4. Nov. 2018 um 15:18:

Sorry, didn't got this from your first message. No, the approach is wrong, what you need is much simpler. In fact, you don't need observation buffer at all :).

To apply your network, you don't need to pass it the same batch of samples as you've used for training. First dimension of input value to the network could be of any size, for example, you can train network using batches of 64 samples and for every net application it will return you the output with 64 entries. But if you want to apply the network to different batch, say with 2 samples, you can just pass it the tensor with size=2 of the first dimension and it will return you the output of size 2.

So, if you have only one observation, just pass it to the network with batch=1, exactly this happens here (line 42): https://github.com/PacktPublishing/Deep-Reinforcement-Learning-Hands-On/blob/master/Chapter06/03_dqn_play.py#L42

We take our state (the array of 4x84x84) and convert it into tensor. But notice square brackets around the state variable. It automatically adds the extra dimension, so, resulting tensor is 1x4x84x84, so, our batch dimension is 1.

The output from the network will also have first dimension of 1, that's why on line 43 we need to access zeroth element of the output to obtain our q-values.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/PacktPublishing/Deep-Reinforcement-Learning-Hands-On/issues/10#issuecomment-435673470, or mute the thread https://github.com/notifications/unsubscribe-auth/AYHzX5RgAxe4lDj-Zh_qhojxJV5ZubS9ks5urvcvgaJpZM4YIgTc .