Improbable-AI / walk-these-ways

Sim-to-real RL training and deployment tools for the Unitree Go1 robot.
https://gmargo11.github.io/walk-these-ways/
Other
488 stars 129 forks source link

Is Kp = 20 too soft? #39

Closed breakds closed 1 year ago

breakds commented 1 year ago

Hi,

First I would like to express my gratitude again on the great work of open sourcing the project and providing the deployment procedure!

I just started trying the deployment of the checkpoint provided in this repo. In the paper Kp = 20 is used as the stiffness for the joint controller and this can be confirmed from both the training code and deployment code.

Based on my experience with unitree Go1 I think Kp = 20 might be too soft. In the deployment script deployment_runner.py the calibrate() function will create an interpolated action sequence from the current qpos to the target qpos, which is

[0.1, 0.8, -1.5, -0.1, 0.8, -1.5, 0.1, 1, -1.5, -0.1, 1, -1.5]

I tried the same thing on our Unitree Go1 and it ends up "kneeling" on the ground with qpos below

[-0.07242385 0.66453123 -1.6734996 0.06848777 0.670284 -1.6667578, -0.13219169 0.8420787 -2.137513 0.12692341 0.8130728 -2.0752623 ]

Note that the rear leg joints have very big terminal errors.

And pictures:

IMG_2774 IMG_2773

Would like to confirm with the author before proceeding - is such behavior expected?

Thanks!

gmargo11 commented 1 year ago

Hi @breakds ,

Yes, this is the expected behavior. The calibration / stand-up move itself is not tuned much. However, the robot will walk normally when you proceed.

I agree that Kp=20 is pretty low for this robot. I chose this intentionally because it gives the joints some natural compliance. I haven't explored tuning it much. At Kp=20, the policy still learns to compensate and execute a good walking gait.

-Gabe

breakds commented 1 year ago

Hi @breakds ,

Yes, this is the expected behavior. The calibration / stand-up move itself is not tuned much. However, the robot will walk normally when you proceed.

I agree that Kp=20 is pretty low for this robot. I chose this intentionally because it gives the joints some natural compliance. I haven't explored tuning it much. At Kp=20, the policy still learns to compensate and execute a good walking gait.

-Gabe

Thanks a lot for the confirmation, Gabe!

GuoPingPan commented 11 months ago

Have you ever tried kp=50? Is it to big for this project? I don't know if my go1 was damaged because I hung it up for deploying(the feet off the ground) or because kp was too big. I train with kp=50 in simulator and performance well.

breakds commented 11 months ago

Have you ever tried kp=50? Is it to big for this project? I don't know if my go1 was damaged because I hung it up for deploying(the feet off the ground) or because kp was too big. I train with kp=50 in simulator and performance well.

I have tried training a simpler policy with Kp = 50 in Mujoco and deploy it with Kp = 50 and it works. The checkpoint provided by Gabe with Kp = 20 also works well when deployed with Kp = 20.

I train with kp=50 in simulator and performance well.

Do you mean you can successfully deploy a policy trained with Kp = 50 on your Go1? In that case, why do you suspect your Go1 was damaged?

GuoPingPan commented 11 months ago

Have you ever tried kp=50? Is it to big for this project? I don't know if my go1 was damaged because I hung it up for deploying(the feet off the ground) or because kp was too big. I train with kp=50 in simulator and performance well.

I have tried training a simpler policy with Kp = 50 in Mujoco and deploy it with Kp = 50 and it works. The checkpoint provided by Gabe with Kp = 20 also works well when deployed with Kp = 20.

I train with kp=50 in simulator and performance well.

Do you mean you can successfully deploy a policy trained with Kp = 50 on your Go1? In that case, why do you suspect your Go1 was damaged?

No, I mean it only work well in simulator. But judging from your experiment, kp=50 might not be the reason for the deployment failed in my go1.

breakds commented 11 months ago

Have you ever tried kp=50? Is it to big for this project? I don't know if my go1 was damaged because I hung it up for deploying(the feet off the ground) or because kp was too big. I train with kp=50 in simulator and performance well.

I have tried training a simpler policy with Kp = 50 in Mujoco and deploy it with Kp = 50 and it works. The checkpoint provided by Gabe with Kp = 20 also works well when deployed with Kp = 20.

I train with kp=50 in simulator and performance well.

Do you mean you can successfully deploy a policy trained with Kp = 50 on your Go1? In that case, why do you suspect your Go1 was damaged?

No, I mean it only work well in simulator. But judging from your experiment, kp=50 might not be the reason for the deployment failed in my go1.

I see. What kind of failure did you observe?

GuoPingPan commented 11 months ago

All the motors had been demanged. I already sent it to be repaired. I hung up go1 without feet contacted with the ground while deploying. Is it the reason? Have you ever try to deploy while hanging up go1 without feet contacted with the ground?

breakds commented 11 months ago

There should be power protection and we set it to 9. Did you do the same?

Also, we always deploy with feet on the ground, sometimes with a rope loosely hanging it

GuoPingPan commented 11 months ago

I use the same power protection level too. Maybe I should deploy as you say. Thanks a lot.

GuoPingPan commented 11 months ago

Have you ever tried kp=50? Is it to big for this project? I don't know if my go1 was damaged because I hung it up for deploying(the feet off the ground) or because kp was too big. I train with kp=50 in simulator and performance well.

I have tried training a simpler policy with Kp = 50 in Mujoco and deploy it with Kp = 50 and it works. The checkpoint provided by Gabe with Kp = 20 also works well when deployed with Kp = 20.

I train with kp=50 in simulator and performance well.

Do you mean you can successfully deploy a policy trained with Kp = 50 on your Go1? In that case, why do you suspect your Go1 was damaged?

@breakds Sorry for bother you again. You mean you just used kp=50 in mujoco for a simple policy but not walk-these-ways? Have you ever tried to retrain a model and deploy for walk-these-ways? Though the pretrained-model perform well, I found that the reward of the self-retrained model with only modifing control_type='P' can't coverge. Just as bellow:

Is there anything wrong with my behaviours?

image

breakds commented 11 months ago

No I haven't tried retraining WTW. I have been using a larger Kp (i.e. Kp = 50) for other experiments and it works. I think Gabe mentioned above that he hasn't tried "harder" stiffness for WTW yet.