Q_network set / dump does not work as expected

cherishing78 commented 7 years ago

Hi Vince, thank you so much to offer this useful toolbox for us. I just find the Q_network cannot be dumped and resumed with setNetwork / dumpNetwork appropriately. The Learning rate / Epsilon / Discount factor cannot be transferred from trained model to the new one. We can add some tags in the agent.py in the _runEpisode after self._total_mode_reward += reward as follows: print 'Action is {}, V is {}'.format(action, V) print '#{} --- Reward is {}:'.format(maxSteps, reward)

cherishing78 commented 7 years ago

I used example with default toy model. Here is my procedure:

Modify the EPOCHS as 10 to train and dump the Q_network in ./nnet for new Q_network to resume from.
Modify the EPOCHS as 10 to train the second model by using setNetwork to resume the trained Q_network.
Modify the EPOCHS as 20 to train the third Q_network for comparison.

We can find that, the epoch 1-10 in the second log is different from the epoch 11-20 in the third log. Specifically, the most important different part is the V value for each action. More over, if we train long enough, we'll find that the Learning rate / Discount factor / Epsilon also did not transferred from the dumped Q_network.

The setNetwork / dumpNetwork just deal with the layer parameters in the Q_network. When we resume from the dumped Q_network, the training result are not identical with the original training process.

VinF commented 7 years ago

Hi, In fact, it is intended to work that way (I can maybe add some doc about it to make it clear). When you use the function getAllParams/setAllParams, the params are the ones of the neural network, not the hyper-parameters for the training of the Q-network. The goal is that you can afterwards start off with a NN that is already trained. In case you wish to continue the training, you can define any hyper parameters you like along with it, but it doesn't always make sense to use the exact same hyper-parameters you used when you dumped the NN.

cherishing78 commented 7 years ago

Thanks a lot! I understood, it is truly more reasonable.

VinF / deer

Q_network set / dump does not work as expected #44