TEA-Lab / diffusion_reward

[ECCV 2024] 💐Official implementation of the paper "Diffusion Reward: Learning Rewards via Conditional Video Diffusion"
https://diffusion-reward.github.io/
MIT License
70 stars 7 forks source link

Evaluation on Metaworld #4

Closed eugeniafeng closed 4 months ago

eugeniafeng commented 5 months ago

Hello! I am looking to reproduce the results in the paper, but I do not see any scripts for evaluating or testing the model. There is a link to a Metaworld script (demo_sawyer.py) in the README, but the Metaworld repository appears to be missing the imports that this script requires. Could you advise me on how to test the model? Thank you.

TaoHuang13 commented 5 months ago

Hi, may I ask which model you aim to test? Currently, we only support the test of pre-trained reward models in RL training, meaning you need to run RL scripts to test them. You may just download the pre-trained models shown in README and then follow our instructions.

If you aim to test RL models, you may need to write some additional code.

For the Metaworld script, refer to this #1 to ensure you installed the paper-used version of Metwaworld.

eugeniafeng commented 5 months ago

Thank you for the response! I have downloaded the pre-trained models in the README. Are the RL scripts to test these models also included in the README?

I have installed the specified version of Metaworld, but the script still seems to be broken. Thank you!

TaoHuang13 commented 5 months ago

Hi, I think you don't need to run the MW script if you have downloaded the pre-trained models. You may directly run the RL scripts (included in README) to test models. (We provide configuration of Assembly task for your test)

Regarding the failure to run the MW script, I think this may be attributed to the issue of MetaWorld itself, which is out of the range of this repo. But, let us know more details if it still gets stuck after your attempts.

eugeniafeng commented 5 months ago

Hi, could you point me to which script in the README tests the models? They all look like training scripts based on the names.

TaoHuang13 commented 5 months ago

Hi, we train RL to test reward models. So you may run RL scripts in this sense.

If you aim to directly test the reward models (e.g., calculating learned rewards given a trajectory), you may write additional scripts where this function could be recalled.

eugeniafeng commented 5 months ago

Hello, when I try to run the RL script in the README on tasks from Metaworld (assembly-v0 or coffee-push-v0), I run into an error for a dimension mismatch in the evaluation. I was able to fix this by downsampling the image in two locations in the code. However, now when I run the coffee push task, after 1,760,000 training frames, I get a video as the one shown below:

https://github.com/TEA-Lab/diffusion_reward/assets/46321065/9a5437b4-88e7-4d17-97f6-7179c43e9b96

Is this the expected behavior? It seems as though it should be able to learn to push the cup upright in fewer training frames. Similarly, the assembly task after over 2 million training frames appears as below:

https://github.com/TEA-Lab/diffusion_reward/assets/46321065/a93df2a3-72f4-461d-a941-f96d259507df

Thank you!

luccachiang commented 4 months ago

Hi @eugeniafeng , apologies for the delayed response due to some urgent deadlines, and thank you for your patience.

If you have more questions, feel free to add a comment :)