[Proposal] Add more datasets for discrete-action envs

Farama-Foundation / Minari

A standard format for offline reinforcement learning datasets, with popular reference datasets and related utilities

https://minari.farama.org

Other

297 stars 44 forks source link

[Proposal] Add more datasets for discrete-action envs #258

Open carlosgmartin opened 2 weeks ago

carlosgmartin commented 2 weeks ago

Proposal

Currently, there are only 2 datasets for discrete-action envs:

Both are for MiniGrid.

Would it be possible to add a greater number and variety of datasets for discrete-action envs?

younik commented 2 weeks ago

Hi @carlosgmartin,

Robotic tasks are usually the most interesting for offline RL, and they usually have continuous action space. Do you have any environment in mind that you would like to see in our datasets?

carlosgmartin commented 2 weeks ago

@younik Thanks for your quick response. I'd love to see datasets for the following discrete-action environments:

Arcade Learning Environments. Examples (all used in the original 2013 DQN paper):
- Pong
- Breakout
- Space Invaders
- Seaquest
- Beam Rider
- Enduro
- Q*bert
Minigrid Environments, beyond just Four Rooms.
Classic Control Environments + Lunar Lander.

To make the task easier, here's a potential systematic way to generate a dataset for each environment:

Pick a state-of-the-art RL algorithm (to keep training time as short as possible).
Save every Nth training episode to the dataset.

That way the dataset includes a mixture of different levels of skill.

For example, if the environment is Breakout and the algorithm is PPO, we could create a dataset ALE/breakout/ppo-v0.

We could also create a dataset for each environment based on the random policy, e.g. ALE/breakout/random-v0.

younik commented 2 weeks ago

Thanks for the proposal, I would love to host these datasets in our remote! I believe ALE and minigrid expert datasets would be especially interesting for the community.

The way we usually proceed for expert dataset is: 1) Train an agent on the env 2) Publish the model on our HF space 3) Publish a simple collection script on our script repo, like this for example

Would you be interested in contributing to it? The random datasets are less interesting as it is easy for the user to generate them, but we can have for minigrid. For ALE, it would be amazing to have a small human dataset, but of course is more work.

carlosgmartin commented 1 week ago

See the following gist: https://gist.github.com/carlosgmartin/7e451d124a87f8415ae20016fe1caeb3.

The datasets can be generated automatically by cycling through all the environments. For example:

for env in ALE/Pong-v5 ALE/Breakout-v5 ALE/SpaceInvaders-v5 ALE/Seaquest-v5 ALE/BeamRider-v5 ALE/Enduro-v5 ALE/Qbert-v5
do
    python3 generate.py --env $env
done

younik commented 1 week ago

Thanks for the gist! I realized that there are pre-trained models on CleanRL HF hub, so we might want to simply use them to collect the datasets. Similarly, for MiniGrid, we can use the ones in SB3 Zoo

I plan to add them to Minari remote