Some questions about distillation

chengxuxin / extreme-parkour

[ICRA 2024]: Train your parkour robot in less than 20 hours.

https://extreme-parkour.github.io

Other

598 stars 110 forks source link

Some questions about distillation #19

Open chenci107 opened 12 months ago

chenci107 commented 12 months ago

Hello, xuxin, thank you for your open-source work, it has been a great help to me. However, during step 2, which is the distillation process, I set up 4096 environments and trained for 5000 rounds. After completing the training, I tested it and found the results were not good. I carefully analyzed the source code and noticed that the depth_encoder seems not to have been trained because lines 303 and 311 in on_policy_runner.py were commented out. May I ask why this is the case? Should I uncomment the lines above mentioned? Thanks :)

ChengEeee commented 11 months ago

i have the same question，in on_policy_runner.py "self.alg.update_depth_encoder" was commented out

chengxuxin commented 11 months ago

Hi, we use action supervision instead of latent on depth distillation. Please check [this line].(https://github.com/chengxuxin/extreme-parkour/blob/d2ffe27ba59a3229fad22a9fc94c38010bb1f519/rsl_rl/rsl_rl/runners/on_policy_runner.py#L309)

chenci107 commented 11 months ago

Hi, we use action supervision instead of latent on depth distillation. Please check [this line].(

https://github.com/chengxuxin/extreme-parkour/blob/d2ffe27ba59a3229fad22a9fc94c38010bb1f519/rsl_rl/rsl_rl/runners/on_policy_runner.py#L309

)

I used the loss function you referred to in line 309 for the training process. As detailed in the issue raised earlier, "I set up 4096 environments and trained for 5000 rounds. After completing the training, I tested it and found the results were not good. " The robot generated from this training was incapable of fulfilling its designated task, invariably losing balance and toppling over after a mere few steps. Subsequently, I escalated the training duration to 15,000 iterations, hoping for an improvement. Regrettably, this adjustment did not yield any positive change in performance. Could you possibly provide some insight or guidance on how to rectify this issue?

chengxuxin commented 11 months ago

I am not sure what you have changed, so it is hard to tell why it did not work as expected. Is your base policy not performing well as well? Please try to follow the same command in the readme with the original repo.

chenci107 commented 11 months ago

I am not sure what you have changed, so it is hard to tell why it did not work as expected. Is your base policy not performing well as well? Please try to follow the same command in the readme with the original repo.

The performance of my base policy is well. Acting upon your recommendation, I utilized the original repo and the same command to train the policy during the distillation phase. However, I found that the resultant performance was still somewhat subpar.

https://github.com/chengxuxin/extreme-parkour/assets/48233618/09ea1010-6521-4dc6-bfa4-bff53a114bab

chengxuxin commented 11 months ago

I cannot see your video. But to debug you can try without direction distillation first.

1242713693 commented 9 months ago

Hello, have you built your simulation environment successfully? Can you take a look at the graphical interface after your simulation! thank you.

1242713693 commented 9 months ago

I cannot see your video. But to debug you can try without direction distillation first.

Hello, is your display graphical interface in VScode? My computer doesn't have a GPU, can I run it directly on the CPU? If you run directly on the CPU, can you visualize, like in your video! thank you.