Training process goes well, but evaluation results are either 0% or 100 % success rates.

facebookresearch / r3m

Pre-training Reusable Representations for Robotic Manipulation Using Diverse Human Video Data

https://sites.google.com/view/robot-r3m/

MIT License

292 stars 45 forks source link

Training process goes well, but evaluation results are either 0% or 100 % success rates. #8

Closed jasonseu closed 2 years ago

jasonseu commented 2 years ago

Hi, thanks for your awesome work. I follow the steps in README and successfully train the policy network. The loss decreases to a very low level. However, as shown in figure, the evaluation results show that max success rates are either 0% or 100% on benchmarks like kitchen_light_on-v3, kitchen_ldoor_open-v3 and kitchen_sdoor_open-v3. Is this reasonable? Or do you have any suggestions?

suraj-nair-1 commented 2 years ago

Hello, Yeah this is not expected, while there can be variability between checkpoints only 0% or 100% is not what I observed.

First, could you share the exact command you are running? How many demos/what representation/etc.

Also, you followed the installation setup instructions in the ReadMe here, correct? Did you try the "Verifying Correct Installation" commands, and observe the expected numbers listed there (~60% for R3M and ~30% for CLIP)? If you are not getting those numbers then something likely is wrong in installation. For the Franka Kitchen did you make sure to add the line FIXED_ENTRY_POINT = RANDOM_DESK_ENTRY_POINT in mj_envs to use the Randomized Desk?

aravindr93 commented 2 years ago

Yes, this looks like an environment version mismatch issue. Please follow the Readme instructions exactly like Suraj mentioned to get the correct version of environments for evaluation.

jasonseu commented 2 years ago

Hi Suraj, thanks for your quick reply. I used the exact same commands as you suggested in the README for verifying correct installation. I follow your suggestions and add the line FIXED_ENTRY_POINT = RANDOM_DESK_ENTRY_POINT in mj_envs, and now that evaluation results look normal. Is it similar to the results of your experiment?

suraj-nair-1 commented 2 years ago

Yup that looks right to me!