TianhongDai / reinforcement-learning-algorithms

This repository contains most of pytorch implementation based classic deep reinforcement learning algorithms, including - DQN, DDQN, Dueling Network, DDPG, SAC, A2C, PPO, TRPO. (More algorithms are still in progress)
662 stars 109 forks source link

Retraining the saved model #2

Closed anushmanukyan closed 3 years ago

anushmanukyan commented 6 years ago

I'm trying to retrain the saved model, but it behaves very strangely:

  1. does not seem to start from the same behaviour that has been saved
  2. repeats only one type of action after running the retraining

I guess this is pytorch issue, but maybe you've succeeded the retraining, and might know how should it be done?

Saving:

def save_model_for_training(self, episode, filepath):
        checkpoint = {
            'episode': episode,
            'state_dict': self.net.state_dict(),
            'optimizer': self.optimizer.state_dict()
        }
        torch.save(checkpoint, filepath)

 self.save_model_for_training(episode, filepath= self.model_path + 'model.pt')

Loading saved model:

checkpoint = torch.load(self.model_path + 'model.pt')
self.start_episode = checkpoint['episode']
self.net.load_state_dict(checkpoint['state_dict'])
self.optimizer.load_state_dict(checkpoint['optimizer'])

Thanks a lot in advance

TianhongDai commented 6 years ago

@anushmanukyan Could I know which environment and algorithm you are training for?

anushmanukyan commented 6 years ago

@TianhongDai I am using PPO.

TianhongDai commented 6 years ago

@anushmanukyan I guess you just load the weights of the model. if you check the line here: https://github.com/TianhongDai/reinforcement-learning-algorithms/blob/master/07-proximal-policy-optimization/ppo_agent.py#L113 When I test the network, I also load the object of running mean filter. Because during training , I use the running mean filter to normalize the input. So, if you want to retrain your model, you should also load the "trained" running mean filter. Otherwise you will get different result.

anushmanukyan commented 6 years ago

I added running mean filter and retraining seems to work better now. However I have another question: how the demo.py works? Basically I can not figure out how the testing works, since I save the best model, but then when i test this model it has different reward than it had while saving that model. How it can be possible? And also if I run several times the same model then I get different performance.

Thank you so much for your help.

TianhongDai commented 6 years ago

@anushmanukyan Hi, I think demo.py should work fine, you can download my pre-trained model from: https://drive.google.com/drive/u/2/folders/1cZjjCA5WHs-Lfw63ntzeUjMo_wZoIgXw Then, just run python demo.py . It will still get same high scores as it get during training. You can check https://github.com/TianhongDai/reinforcement-learning-algorithms/blob/master/07-proximal-policy-optimization/ppo_agent.py#L111 here to see how did i test the network.