About the training curve

xu-yang16 commented 2 months ago

I am using the code to train A1 on RTX 4090. However, I've noticed a significant decrease in the mean reward around the 2800th iteration. Is everything right? Should I continue training?

screenshot-20240430-133237 screenshot-20240430-133213

hdShang commented 2 months ago

I had the same problem, the rewards fluctuated a lot.The first drop in rewards may be due to the command curriculum update, but the reason for the second drop is something I don't know yet.

sixFlag commented 2 months ago

为啥a1.urdf 中的limit effort 会被修改啊？

sixFlag commented 2 months ago

Please post the learning rate for a look. Visual inspection suggests that there are too many iterations. 1500 iterations is enough.

Junfeng-Long commented 2 months ago

Sorry for the late reply. We did not train for such a long time. However, the significant decrease in the mean reward is probably due to the curriculum of commands and terrains. I suggest training for at most 2000 iterations. We also updated the config for a1 and go1, high speed is a bit of a conflict with the ability to cross difficult terrains for small dogs, so we limited the highest command for small dogs to 2m/s.

Junfeng-Long commented 2 months ago

为啥a1.urdf 中的limit effort 会被修改啊？

We use the a1.urdf from the original repo of legged_gym. There is also a file named a1.urdf.origin from unitree_ros, which seems hard to train. But the deployment is built upon the control loop of joint position and unitree SDK seems to handle this issue well. Therefore, don't worry about this. Position limits, velocity limits and power penalization are enough.

sixFlag commented 2 months ago

为啥a1.urdf 中的limit effort 会被修改啊？

We use the a1.urdf from the original repo of legged_gym. There is also a file named a1.urdf.origin from unitree_ros, which seems hard to train. But the deployment is built upon the control loop of joint position and unitree SDK seems to handle this issue well. Therefore, don't worry about this. Position limits, velocity limits and power penalization are enough.

In other words, the current environment is not suitable for small dogs, but in order not to change the environment, the model is changed. Am i right?

UltronAI commented 1 month ago

@Junfeng-Long Thank you for your impressive work on this project. I've been trying to replicate the results shown in Figure 5 of the HIMloco paper but have encountered some discrepancies. My training results are notably lower than those reported. I noticed that the curves I obtained (shown below) appear similar to those previously posted by other users.

I ran the python train.py command without making any modifications. Here are the curves I obtained:

If I want to compare different methods in simulation, can I directly compare them with these curves?

hanzhi0410 commented 1 month ago

Hello, I have also encountered the same problem. Do you know where the problem lies? Thank you for your help

Junfeng-Long commented 1 month ago

Sorry for the late reply. There are some bugs and improper configs in the code. Already fixed, please try the new one.

hanzhi0410 commented 1 month ago

为啥a1.urdf 中的limit effort 会被修改啊？

We use the a1.urdf from the original repo of legged_gym. There is also a file named a1.urdf.origin from unitree_ros, which seems hard to train. But the deployment is built upon the control loop of joint position and unitree SDK seems to handle this issue well. Therefore, don't worry about this. Position limits, velocity limits and power penalization are enough.

Thank you for your impressive work on this project.I used this project to train a policy and wanted to simulate it in Gazebo, but I found that the policy performed well in isaac. However, when I used Gazebo, the robot shook violently and could not stand properly. Have you done similar work before? Can you give me some suggestions? Thank you for your help

Junfeng-Long commented 1 month ago

为啥a1.urdf 中的limit effort 会被修改啊？

We use the a1.urdf from the original repo of legged_gym. There is also a file named a1.urdf.origin from unitree_ros, which seems hard to train. But the deployment is built upon the control loop of joint position and unitree SDK seems to handle this issue well. Therefore, don't worry about this. Position limits, velocity limits and power penalization are enough.

Thank you for your impressive work on this project.I used this project to train a policy and wanted to simulate it in Gazebo, but I found that the policy performed well in isaac. However, when I used Gazebo, the robot shook violently and could not stand properly. Have you done similar work before? Can you give me some suggestions? Thank you for your help

We have done this test with Aliengo in gazebo, it works well but is still worse than Isaac. I would like to help if you can offer more information. For example, video, inference output, or the code. You can send me directly or post them here.

hanzhi0410 commented 1 month ago

为啥a1.urdf 中的limit effort 会被修改啊？

We use the a1.urdf from the original repo of legged_gym. There is also a file named a1.urdf.origin from unitree_ros, which seems hard to train. But the deployment is built upon the control loop of joint position and unitree SDK seems to handle this issue well. Therefore, don't worry about this. Position limits, velocity limits and power penalization are enough.

Thank you for your impressive work on this project.I used this project to train a policy and wanted to simulate it in Gazebo, but I found that the policy performed well in isaac. However, when I used Gazebo, the robot shook violently and could not stand properly. Have you done similar work before? Can you give me some suggestions? Thank you for your help

We have done this test with Aliengo in gazebo, it works well but is still worse than Isaac. I would like to help if you can offer more information. For example, video, inference output, or the code. You can send me directly or post them here. May I ask what inference tool you are using, is it libtorch? Thank you for your help.

Junfeng-Long commented 1 month ago

为啥a1.urdf 中的limit effort 会被修改啊？

We use the a1.urdf from the original repo of legged_gym. There is also a file named a1.urdf.origin from unitree_ros, which seems hard to train. But the deployment is built upon the control loop of joint position and unitree SDK seems to handle this issue well. Therefore, don't worry about this. Position limits, velocity limits and power penalization are enough.

Thank you for your impressive work on this project.I used this project to train a policy and wanted to simulate it in Gazebo, but I found that the policy performed well in isaac. However, when I used Gazebo, the robot shook violently and could not stand properly. Have you done similar work before? Can you give me some suggestions? Thank you for your help

We have done this test with Aliengo in gazebo, it works well but is still worse than Isaac. I would like to help if you can offer more information. For example, video, inference output, or the code. You can send me directly or post them here. May I ask what inference tool you are using, is it libtorch? Thank you for your help.

We use pytorch since there are cuda on dog's ob-board computer.

Junfeng-Long commented 1 month ago

为啥a1.urdf 中的limit effort 会被修改啊？

We use the a1.urdf from the original repo of legged_gym. There is also a file named a1.urdf.origin from unitree_ros, which seems hard to train. But the deployment is built upon the control loop of joint position and unitree SDK seems to handle this issue well. Therefore, don't worry about this. Position limits, velocity limits and power penalization are enough.

Thank you for your impressive work on this project.I used this project to train a policy and wanted to simulate it in Gazebo, but I found that the policy performed well in isaac. However, when I used Gazebo, the robot shook violently and could not stand properly. Have you done similar work before? Can you give me some suggestions? Thank you for your help

It seems that you accidentally open an issue under my homepage repo. Sorry for not noticing that. But happy to see that you figured out the problem:)

hanzhi0410 commented 1 month ago

为啥a1.urdf 中的limit effort 会被修改啊？

We use the a1.urdf from the original repo of legged_gym. There is also a file named a1.urdf.origin from unitree_ros, which seems hard to train. But the deployment is built upon the control loop of joint position and unitree SDK seems to handle this issue well. Therefore, don't worry about this. Position limits, velocity limits and power penalization are enough.

Thank you for your impressive work on this project.I used this project to train a policy and wanted to simulate it in Gazebo, but I found that the policy performed well in isaac. However, when I used Gazebo, the robot shook violently and could not stand properly. Have you done similar work before? Can you give me some suggestions? Thank you for your help

It seems that you accidentally open an issue under my homepage repo. Sorry for not noticing that. But happy to see that you figured out the problem:) Hello, thank you for your reply. I'm sorry, but as a beginner, I'm not familiar with GitHub. I left a message in the wrong position before, and I noticed that during real machine deployment, there may be a situation where the hind legs are inside eight and the dog's steering is not responsive. Have you ever encountered this situation? I hope you can give me some advice. Thank you for your help

hanzhi0410 commented 1 month ago

And I found that the training strategy also has an inner eight situation 微信图片_20240518125650

Junfeng-Long commented 1 month ago

I think this is due to improper target height configuration. Try a lower target height, for example, 0.25m for a1.

hanzhi0410 commented 3 weeks ago

I think this is due to improper target height configuration. Try a lower target height, for example, 0.25m for a1.

Thank you for your reply. I noticed that your code has set the leg lifting height, but the weight setting is very low and the leg lifting height after training is not ideal, which is much different from the effect in the video you posted. I would like to ask if you have any other methods to improve this issue.

OpenRobotLab / HIMLoco

About the training curve #6