PKU-RL / CORRO

CORRO code
34 stars 6 forks source link

Experiments on cheetah-dir and ant-dir #2

Open Lagrant opened 1 year ago

Lagrant commented 1 year ago

Hi,

I've tried normalizing environments, revising reward functions, upgrading/downgrading MuJoCo versions, but still not able to reproduce the performance declared in your paper on ant-dir. The average return just fluctuates at a very low level around 10. Besides, experiments on cheetah-dir get an average training return of 1300 but an average testing return of -1400 which never happens on other enviroments. There seems to be something wrong with the environment. Could you also check it out?

My experiment logs, models and configurations are uploaded to google file for your reference.

Any reply would be much appreciated!

nanzhu2003 commented 8 months ago

Hello, I tried the hopper_param and it didn't get good results yet. At the same time, can I ask you which model did you load when run the test_ood_context.py?When I run this code, it always showed that the input size is incompatible with the parameter that I trained through the train_offpolicy_with_trained_encoder. Thank you a lot!

Foo1szz commented 8 months ago

Hi,

I've tried normalizing environments, revising reward functions, upgrading/downgrading MuJoCo versions, but still not able to reproduce the performance declared in your paper on ant-dir. The average return just fluctuates at a very low level around 10. Besides, experiments on cheetah-dir get an average training return of 1300 but an average testing return of -1400 which never happens on other enviroments. There seems to be something wrong with the environment. Could you also check it out?

My experiment logs, models and configurations are uploaded to google file for your reference.

Any reply would be much appreciated!

I got the same question. When I ran experiments on ant-dir. The average return fluctuates at a very low level around 10, too. Could the authors give some advices?