Closed yangyichu closed 11 months ago
Hi, I have a question about dagger-value algorithm: when updating value network, why do you use torch.max() to get the larger loss?
torch.max()
What's the meaning comparing these two losses? In my understanding, using clipped value loss is to keep the training procedure stable, but in that case why is it max not min, or, why not just use value_losses_clipped directly? https://github.com/PKU-EPIC/UniDexGrasp2/blob/a223e627216e7fdc4f5cda4475451e805068bc9f/dexgrasp/algorithms/rl/dagger_value/dagger.py#L433-L437
max
min
value_losses_clipped
In above code, the target_values_batch is the student critic value before learning epoches, value_batch is student critic value during learning epoches. https://github.com/PKU-EPIC/UniDexGrasp2/blob/a223e627216e7fdc4f5cda4475451e805068bc9f/dexgrasp/algorithms/rl/dagger_value/dagger.py#L272
target_values_batch
value_batch
You can check this issue
https://github.com/ikostrikov/pytorch-a2c-ppo-acktr-gail/issues/160
I see. Thanks!
Hi, I have a question about dagger-value algorithm: when updating value network, why do you use
torch.max()
to get the larger loss?What's the meaning comparing these two losses? In my understanding, using clipped value loss is to keep the training procedure stable, but in that case why is it
max
notmin
, or, why not just usevalue_losses_clipped
directly?https://github.com/PKU-EPIC/UniDexGrasp2/blob/a223e627216e7fdc4f5cda4475451e805068bc9f/dexgrasp/algorithms/rl/dagger_value/dagger.py#L433-L437
In above code, the
target_values_batch
is the student critic value before learning epoches,value_batch
is student critic value during learning epoches. https://github.com/PKU-EPIC/UniDexGrasp2/blob/a223e627216e7fdc4f5cda4475451e805068bc9f/dexgrasp/algorithms/rl/dagger_value/dagger.py#L272