ChuaCheowHuan / reinforcement_learning

My reproduction of various reinforcement learning algorithms (DQN variants, A3C, DPPO, RND with PPO) in Tensorflow.
https://chuacheowhuan.github.io/
MIT License
35 stars 11 forks source link

Memory Leak in DDQN #22

Open xiboli opened 10 months ago

xiboli commented 10 months ago

Thank you so much that you program the Double DQN algorithm. However when I run this algorithm I faced a memory increase consistantly when trainning. Do you have any idea where the memory leak could happen?

https://github.com/ChuaCheowHuan/reinforcement_learning/blob/master/DQN_variants/DDQN/double_dqn_cartpole.py#L339

xiboli commented 10 months ago

I have found that the huber_loss with GradientDescentOptimizer cause the memory leak, and when I changed to reduce mean with RMSPropOptimizer it disappears. Can you explain why you use the huber loss with gradient descent optimizer? Thank you so much.

       with tf.variable_scope('loss'):
            self.loss = tf.reduce_mean(tf.squared_difference(td_target, predicted_Q_val))  # tf.losses.huber_loss(td_target, predicted_Q_val)
        with tf.variable_scope('optimizer'):
            self.optimizer = tf.train.RMSPropOptimizer(self.learning_rate).minimize(self.loss) #tf.train.GradientDescentOptimizer(self.learning_rate).minimize(self.loss)
bcnichols commented 7 months ago

Thanks for your observation--I wasn't aware that the "leak" was associated with the Huber loss function and sadly don't know why this should be, but will make a note to check it out once things here subside to a dull roar, so to speak.

Until we can evaluate the impact of a change of loss function production code at the moment is avoiding batch inputs with model.fit(), instead fitting in a loop and saving/clearing/reloading the model periodically, which stopgap manages to prevent memory (64 GBytes) being completely consumed before convergence obtains.

If it's of any interest the restart algo is triggered by the following command placed at a convenient spot in the model.fit() loop:

agent.save_restart(repeat,idx)

where "agent" is a class instance containing the model and its methods, as follows:

  def save_restart(self, repeat, idx):
    self.last_model = f'{self.model_path}/{self.model_name}_{repeat}_{idx}'
    self.model.save(self.last_model)
    tf.keras.backend.clear_session()
    self.load_last_model()

  def load_last_model(self):
    model = load_model(self.last_model, custom_objects = self.custom_objects, compile=False)
    model.compile(optimizer = self.optimizer, loss = self.loss())

  def save(self, repeat, idx):
    self.last_model = f'{self.model_path}/{self.model_name}_{repeat}_{idx}'
    self.model.save(self.last_model)