When using experience replay, why don't you update Q_target?

andyzeng / visual-pushing-grasping

Train robotic agents to learn to plan pushing and grasping actions for manipulation with deep reinforcement learning.

http://vpg.cs.princeton.edu/

BSD 2-Clause "Simplified" License

883 stars 314 forks source link

When using experience replay, why don't you update Q_target? #81

Open Zixin-Tang opened 3 years ago

Zixin-Tang commented 3 years ago

# Recompute prediction value and label for replay buffer
if sample_primitive_action == 'push':
    trainer.predicted_value_log[sample_iteration] = [np.max(sample_push_predictions)]
    # trainer.label_value_log[sample_iteration] = [new_sample_label_value]
elif sample_primitive_action == 'grasp':
    trainer.predicted_value_log[sample_iteration] = [np.max(sample_grasp_predictions)]
    # trainer.label_value_log[sample_iteration] = [new_sample_label_value]

@andyzeng