Question Regarding Data Cleaning and Privileged Policy Training in H2O

LeCAR-Lab / human2humanoid

[IROS 2024] Learning Human-to-Humanoid Real-Time Whole-Body Teleoperation. [CoRL 2024] OmniH2O: Universal and Dexterous Human-to-Humanoid Whole-Body Teleoperation and Learning

https://omni.human2humanoid.com/

281 stars 14 forks source link

Question Regarding Data Cleaning and Privileged Policy Training in H2O #6

Closed Perkins729 closed 1 month ago

Perkins729 commented 1 month ago

Thank you for your open-source work. I would like to attempt to replicate the data cleaning in H2O, specifically training the privileged policy in H2O and using the algorithm for data filtering. How can I achieve this? 1.Should I follow the instructions in the section “Try training and playing privileged teacher policy”? Are the privileged policies used for data cleaning in H2O the same as those in OmniH2O? 2.How is the filtering algorithm implemented?

TairanHe commented 1 month ago

Hi you should first finish the retargeting of the entire AMASS and train a privileged teacher policy without any domain randomization or penalty reward. Then use the privileged teacher policy to evaluate all the AMASS motions and filter out those motions where the privileged teacher policy fails to track (if at any timestep the reference motion distance > 0.5m)

Perkins729 commented 1 month ago

Is the file amass_phc_filtered.pkl in this link the result of the data processing steps you mentioned?

TairanHe commented 1 month ago

Yes!