Closed chenxi-yang closed 1 year ago
Hi @chenxi-yang, thanks for the report. Can you be more explicit on the errors you are getting? Is it not learning at all, or is it that the reward less than you expect? I don't think I've ever used real_data > 0
.
Hi, I tried a few other settings. To elaborate my question a bit, I can have a good final reward for my policy. However, the training curve is unstable as below (having peaks during training).
The command I used in CUDA_VISIBLE_DEVICES=6 python -m mbrl.examples.main algorithm=mbpo overrides=mbpo_hopper dynamics_model=gaussian_mlp
Ah, got it. Unfortunately, the peaky behavior is definitely an issue for our MBPO implementation; certainly it is for Hopper, but I think also for other domains. Not sure if this is the cause, but when I swept for hyperparameters, I roughly optimized for area under curve on a single seed, rather than stable behavior. So, it's possible that this could be addressed by tweaking hyperparameters, but I haven't looked into this.
Thanks for the update.
Hi I can not make hopper and walker2d work in this default setting. May I ask if you set a >0 real_data_ratio to these two experiments? Thanks!