Open hatimwen opened 5 months ago
Hello,
I have also successfully trained the low-level model and obtained good results. However, it faces challenges during the high-level training phase, similar to what has been mentioned above. It appears that while the low-level policy performs well, it struggles when integrated into the high-level policy.
If you have any updates or progress in addressing this issue, or if there are plans to release the pre-trained model weights, I would greatly appreciate it if you could let me know.
Thank you for your attention and support!
Hi @Ericonaldo , I just run the b1z1-float setting, and similarly, the success rates are still near 0. I notice this setting does not require pretrained low-level policy. Thus, there may be something with the high-level part. Any comments?
Hi, sorry for the late response. Please try the latest commit and I have some learning results that can be referred here. I tested many epochs and I found out that in my running on a single 4090, the best model comes from 37000-38000. Try yourself and if you find any problem please let me know. Regarding the floating base, it should not be so high.
Hi, sorry for the late response. Please try the latest commit and I have some learning results that can be referred here. I tested many epochs and I found out that in my running on a single 4090, the best model comes from 37000-38000. Try yourself and if you find any problem please let me know. Regarding the floating base, it should not be so high.
Hi @Ericonaldo , Thanks a lot for your response. I just checked the wandb logs you provided and noticed a significant variance in the final performance. 😢
It's puzzling to pinpoint the source of this variance. One possibility is that it arises from the low-level model, where different iterations cause substantial differences in subsequent high-level training. Another possibility is that the variance is inherent to the high-level model training itself. For instance, if we consistently select one low-level model, such as the one at 37000 which showed the highest results in the wandb logs, and train the high-level model multiple times, what would the variance in the results be? These are just some of my thoughts, and I'm currently re-running the code you provided.
Regarding the floating base setting, my reproduced results are below 0.5%. Is this normal, or could it be due to the variance in high-level training, as mentioned in my second guess?
Btw, how do you judge a good low-level model? Based on mean rewards, different iterations of models seem to perform similarly (around 21-22). From your experience, is the low-level model with the highest mean rewards the one that most benefits further high-level training? And what is the range of mean rewards for the low-level models that you find beneficial in your experiments?
Thanks again!
Hi, basically, we trained a set of low-level policies and deploy them onto our real robot to find the best one, and we train and tune the high-level model based on that particular low-level policy. That may explain why only a few of them work better. One important note is, always taking the behavior in the real world as the principle to choose a model, as rewards may lie and agents may cheat.
I see.
I just checked the reproduction results using the latest code. Unfortunately, the results are still poor. I also tried training the high-level policy with different iterations of low-level models, but none of them achieved the impressive reward and success rate plots shown in your logs.
Could you please provide the pre-trained checkpoints for both the low-level and high-level models? Alternatively, could you share the wandb logs for the low-level training phase so I can compare them with mine?
Thanks again.
Hi,
Using the provided well pre-trained low-level checkpoint, I’ve obtained some results for the high-level model.
This one achieves better performance than the high-level models trained with my low-level models. However, there is still a gap, and none of the results for all categories exceed a 20% success rate after 60k steps. It seems the training process for the high-level model still has significant variance.
Hi,
I face the same issue here, after using the provided low-level checkpoints, my high-level's success rate couldn't achieve 10% after 60k steps.
This is my training inputs:
python train_multistate.py --rl_device "cuda:7" --sim_device "cuda:7" --timesteps 60000 --headless --task B1Z1PickMulti --experiment_dir logs --wandb --wandb_project "b1-pick-multi-teacher" --wandb_name "teacher_baseline_37000" --roboinfo --observe_gait_commands --small_value_set_zero --rand_control --stop_pick
and I didn't change the code except for changing the type of some indices variables since the interpreter report it as a bug.
Is there anything I need to modify to achieve a better performance(such as the coefficient of reward functions or input parameters)?
@hatimwen Could you please share how did you train the high-level to achieve 20% success rate?
Hi, I face the same issue here, after using the provided low-level checkpoints, my high-level's success rate couldn't achieve 10% after 60k steps. This is my training inputs:
python train_multistate.py --rl_device "cuda:7" --sim_device "cuda:7" --timesteps 60000 --headless --task B1Z1PickMulti --experiment_dir logs --wandb --wandb_project "b1-pick-multi-teacher" --wandb_name "teacher_baseline_37000" --roboinfo --observe_gait_commands --small_value_set_zero --rand_control --stop_pick
and I didn't change the code except for changing the type of some indices variables since the interpreter report it as a bug. Is there anything I need to modify to achieve a better performance(such as the coefficient of reward functions or input parameters)? @hatimwen Could you please share how did you train the high-level to achieve 20% success rate?
Hi @zgdjcls ,
I didn't change anything. For convenience, we could discuss in WeChat. My WeChat ID is wht2020zrj
.
Hi, I face the same issue here, after using the provided low-level checkpoints, my high-level's success rate couldn't achieve 10% after 60k steps. This is my training inputs:
python train_multistate.py --rl_device "cuda:7" --sim_device "cuda:7" --timesteps 60000 --headless --task B1Z1PickMulti --experiment_dir logs --wandb --wandb_project "b1-pick-multi-teacher" --wandb_name "teacher_baseline_37000" --roboinfo --observe_gait_commands --small_value_set_zero --rand_control --stop_pick
and I didn't change the code except for changing the type of some indices variables since the interpreter report it as a bug. Is there anything I need to modify to achieve a better performance(such as the coefficient of reward functions or input parameters)? @hatimwen Could you please share how did you train the high-level to achieve 20% success rate?Hi @zgdjcls , I didn't change anything. For convenience, we could discuss in WeChat. My WeChat ID is
wht2020zrj
. just sent request to you
For reference, here's my wandb curve.
Hi, I rerun the high-level part and the results remain the same. Seems pretty stable to me as I set a fixed seed.
Hi, I rerun the high-level part and the results remain the same. Seems pretty stable to me as I set a fixed seed.
Could you please share the weights and log files?
Hi, I rerun the high-level part and the results remain the same. Seems pretty stable to me as I set a fixed seed.
Hi,
I also tried rerunning the code multiple times, and the results stayed below 20%. I’ve installed the packages using the same versions as yours, according to the provided wandb. I still can’t figure out the reason for the discrepancy. 🤔
Hi, I rerun the high-level part and the results remain the same. Seems pretty stable to me as I set a fixed seed.
Hi,
I also tried rerunning the code multiple times, and the results stayed below 20%. I’ve installed the packages using the same versions as yours, according to the provided wandb. I still can’t figure out the reason for the discrepancy. 🤔
Yeah, it's weird... You mentioned that you use 3090? My high-level part is also trained on a 3090 server.
Hi, I rerun the high-level part and the results remain the same. Seems pretty stable to me as I set a fixed seed.
Hi, I also tried rerunning the code multiple times, and the results stayed below 20%. I’ve installed the packages using the same versions as yours, according to the provided wandb. I still can’t figure out the reason for the discrepancy. 🤔
Yeah, it's weird... You mentioned that you use 3090? My high-level part is also trained on a 3090 server.
Yes, I use a 3090 server.
Hi, see the logs here. I uploaded one low-level model here for your reference.
@zgdjcls It's already shared
Could you please share the high-level weights? Since we run the high-level part and there is a huge difference between yours and ours. We want to use your high success-rate teacher model to further train the student model
Hi. I faced the same issues as mentioned by everyone here. My high level has a huge gap with what you shared. I am using the low-level model which is shared and without any change in the code. My GPU is Quadro RTX 6000. Does the model of the GPU have a huge impact on the result?
Hi, I found a potential issue and solution for the high-level training. Can you guys have a try? I wrote all details in https://github.com/Ericonaldo/visual_wholebody/issues/11.
Hi,
Thank you for sharing this exciting work.
As mentioned in Issue https://github.com/Ericonaldo/visual_wholebody/issues/3#issuecomment-2156458140, it is challenging to reproduce the results of the high-level policy, with the final success rates being near zero.
I suspect the issue might stem from the grasping component. If the task considers reaching the target object as a success, the success rates during training could be nearly 100%, indicating that getting close to the object is easy. However, replacing all objects with a single cube (side length 0.045m) still results in a near-zero success rate. This suggests that even for simple shapes, the trained low-level policy struggles to grasp them effectively.
As @Ericonaldo mentioned in https://github.com/Ericonaldo/visual_wholebody/issues/3#issuecomment-2157052061, it was possible to achieve good results with your previous low-level model. However, my trained low-level model appears to perform well initially but fails during high-level training, which is quite confusing.
Btw, do you have any plans to release the pre-trained model weights?
If you have any progress on reproducing the results, please let me know. Thanks!