aviralkumar2907 / BEAR

Code for Stabilizing Off-Policy RL via Bootstrapping Error Reduction
158 stars 39 forks source link

Couldn't reproduce the result on MuJoCo Suite (d4rl datasets). #8

Open sweetice opened 4 years ago

sweetice commented 4 years ago

Hi, Kumar! In the last issue, you mentioned that you don't test BEAR on the final buffer setting and recommend me using d4rl datasets. Following your comments, I use d4rl datasets and the code in d4rl_evaluation. What's a pity, I cannot reproduce your results. The results are here. :) Offline_rl_resutls

For more clear reading. Offline_rl_results.pdf

aviralkumar2907 commented 4 years ago

Hi, that's unfortunate, but can you try with these hyperparameters (I think the hyperparameters mentioned in bear.py by default are not the most ideal):

I will edit these parameters in the launcher for BEAR making it easy to reproduce results.

aviralkumar2907 commented 4 years ago

I have created a pull request in the d4rl_evaluations repo as well, mentioning these hyperparameters in the readme.

Also, which version of the D4RL datasets is this? We have changed/reorganized some datasets recently, and while they have not changed much, there could be a little variability in the results. So if with the above hyperparameters performance doesn't match the paper, I can dig up into the dataset configurations to see what changed.

sweetice commented 4 years ago

Great! Thanks for your warm reply! I will try to implement the BEAR results again. For the d4rl datasets, I use this version (20200803).

familyld commented 3 years ago

Hi, @sweetice. Did you successfully reproduce the results? I use the code from d4rl_evaluations and also fail to reproduce the results. The performance of BCQ is similar to what you presented but BEAR performs a bit differently.