PKU-EPIC / UniDexGrasp2

UniDexGrasp++: Improving Dexterous Grasping Policy Learning via Geometry-aware Curriculum and Iterative Generalist-Specialist Learning
MIT License
104 stars 5 forks source link

A Question About Dagger Value Algorithm #2

Closed yangyichu closed 8 months ago

yangyichu commented 9 months ago

Hi, I have a question about dagger-value algorithm: when updating value network, why do you use torch.max() to get the larger loss?

What's the meaning comparing these two losses? In my understanding, using clipped value loss is to keep the training procedure stable, but in that case why is it max not min, or, why not just use value_losses_clipped directly?
https://github.com/PKU-EPIC/UniDexGrasp2/blob/a223e627216e7fdc4f5cda4475451e805068bc9f/dexgrasp/algorithms/rl/dagger_value/dagger.py#L433-L437

In above code, the target_values_batch is the student critic value before learning epoches, value_batch is student critic value during learning epoches. https://github.com/PKU-EPIC/UniDexGrasp2/blob/a223e627216e7fdc4f5cda4475451e805068bc9f/dexgrasp/algorithms/rl/dagger_value/dagger.py#L272

Phimos commented 8 months ago

You can check this issue

https://github.com/ikostrikov/pytorch-a2c-ppo-acktr-gail/issues/160

yangyichu commented 8 months ago

I see. Thanks!