Farama-Foundation / D4RL

A collection of reference environments for offline reinforcement learning
Apache License 2.0
1.34k stars 286 forks source link

Couple things #2

Closed spitis closed 4 years ago

spitis commented 4 years ago
  1. Need to grant permissions on the mixed mujoco envs (maybe other non-mujoco ones as well). Getting error AccessDeniedException: 403 ... does not have storage.objects.list access to justinjfu-public.

  2. You should wrap your environments in a TimeLimit wrapper: https://github.com/openai/gym/blob/master/gym/wrappers/time_limit.py. Right now they must be automatically terminated else infinite loop.

  3. Missing the Ant mujoco from Bear paper =(

  4. Not really an issue with the code, but curious for your thoughts: do you have reason to believe that the environments on which none of your tested methods achieve any reasonable score, are even solvable / well-posed problems for the tabula rasa, completely offline setting? It seems a bit silly to throw these out there as benchmarks. Potentially better approach 1: start with too much data, so that a reasonable baseline (e.g., offline SAC) solved. Then the benchmark becomes not performance, but equivalent performance gotten from random sub sampling of the data. Potentially better approach 2: offer these as warm-up / demo-like datasets, to see how fast an agent that has the ability to explore can achieve good performance using them to bootstrap its performance.

Anyways, thanks for making the Mujoco datasets available. I have the following "better" expert agents to produce datasets if you are interested (probably can do even better, wasn't sure why Bear paper stopped short on the expert perf): Ant 6900, HalfCheetah 16700, Hopper 4200, Walker 6600.

justinjfu commented 4 years ago

Thank you for catching these!

The issues in points 1 & 2 seem to be limited to the gym-mujoco tasks. 3 is on the way as well.

4 is an interesting point and a valid concern. It's more complicated than simply adding more data, because the distribution of the data seems to matter as well - for example offline SAC tends to perform poorly on expert data even when given huge amounts of data. We are trying to subsample a certain set of successful trajectories and seeing if an algorithm like cloning will work from those - this will hopefully come in an update in a few weeks.