[Bug] What's the real_data_ratio to hopper and walker2d?

facebookresearch / mbrl-lib

Library for Model Based RL

MIT License

952 stars 154 forks source link

[Bug] What's the real_data_ratio to hopper and walker2d? #167

Closed chenxi-yang closed 1 year ago

chenxi-yang commented 1 year ago

Hi I can not make hopper and walker2d work in this default setting. May I ask if you set a >0 real_data_ratio to these two experiments? Thanks!

luisenp commented 1 year ago

Hi @chenxi-yang, thanks for the report. Can you be more explicit on the errors you are getting? Is it not learning at all, or is it that the reward less than you expect? I don't think I've ever used real_data > 0.

chenxi-yang commented 1 year ago

Hi, I tried a few other settings. To elaborate my question a bit, I can have a good final reward for my policy. However, the training curve is unstable as below (having peaks during training). The command I used in CUDA_VISIBLE_DEVICES=6 python -m mbrl.examples.main algorithm=mbpo overrides=mbpo_hopper dynamics_model=gaussian_mlp

luisenp commented 1 year ago

Ah, got it. Unfortunately, the peaky behavior is definitely an issue for our MBPO implementation; certainly it is for Hopper, but I think also for other domains. Not sure if this is the cause, but when I swept for hyperparameters, I roughly optimized for area under curve on a single seed, rather than stable behavior. So, it's possible that this could be addressed by tweaking hyperparameters, but I haven't looked into this.

chenxi-yang commented 1 year ago

Thanks for the update.