carlosferrazza / humanoid-bench

Other
344 stars 36 forks source link

Two questions about visualization effects #3

Closed LiangZhisama closed 5 months ago

LiangZhisama commented 5 months ago

Hello, and first of all, thank you for your valuable contribution to the humanoid robotics community!

Also in response to the experimental performance on the website and in the paper, I have two questions. We can notice that tasks such as 'basketball', 'stair' and 'highbar' in the paper are far from reaching the dashed line (i.e., the quantitative metrics you provided to judge the success of the tasks), but you can see in the web page that 'basketball', 'stair' and 'highbar' have been visualized well.

So the first question I would like to ask is the video in the webpage obtained by visualizing the policy trained by the one of the four SOTA reinforcement learning method? Another question is how the quantitative indicator of the dashed line was obtained.

Thank you for taking the time to check out my question and I look forward to hearing from you!

Bailey-24 commented 5 months ago

I also think it is something wrong

carlosferrazza commented 5 months ago

Hi! Thanks for the interest in our work.

As stated in the paper: "The dashed lines qualitatively indicate task success". In fact, their purpose is to provide a reference return that indicates a certain degree of consistent success in the task execution. These values were obtained in several different ways for the different tasks, e.g., based on successful episodes, using Mujoco interactive viewer, carefully hand-designing simulation keyframes, scripting success trajectories.

Regarding your other question, the plots in the paper report average returns with standard deviation. However, you should take a look at Table IV in the appendix to check the maximum returns obtained for each of the tasks (e.g., stair or basketball). In fact, many of the videos on the website are extracted from episodes closer to success than the average ones, to give the user an idea of the expected behavior for each task.

The highbar task shown on the website is instead using the h1-highbar_simple-v0 environment (as opposed to h1strong-highbar_hard-v0 benchmarked in the paper). This environment does not feature robot hands, but has the upper extremities attached to the bar by means of an equality constraint. This basically simplifies the task to a high-dimensional inverted pendulum, and serves an entry-level challenge for the highbar task. However, being a non-realistic toy task, we did not include it in the main paper results.

LiangZhisama commented 5 months ago

Thank you very much! That really cleared up my confusion.