Question about the Estimator Policy

P1terQ commented 11 months ago

Great work and thanks for the opensource! I notice the code use estimator to predict privileged_states(base_lin_vel) in obs when training the teacher policy. And it isn't used later in training depth student policy and play.py. My understanding is that you still need the estimator to predict the base linear velocity when deploying the policy in reality. So why don't you choose to use the estimator in depth student training pipeline or in the play.py code? Or perhaps i misunderstand your code. Really looking forward to your explanation.

chengxuxin commented 11 months ago

You are right. In play.py the true velocity is used. Thanks for pointing this out. However during training predicted velocity is used in this line.

In save_jit.py the velocity is also predicted from the estimator. So it is also correct for deployment.

P1terQ commented 11 months ago

Yeah, now i understand. Thanks again.

chengxuxin / extreme-parkour

Question about the Estimator Policy #8