ManifoldRG / NEKO

In Progress Implementation of GATO style Generalist Multimodal model capable of image, text, RL and Robotics tasks
https://discord.gg/brsPnzNd8h
GNU General Public License v3.0
42 stars 10 forks source link

Source MiniGrid/BabyAI Dataset #49

Open daniellawson9999 opened 1 year ago

daniellawson9999 commented 1 year ago

Background

BabyAI is a "gridworld environment whose levels consist of instruction-following tasks that are described by a synthetic language". Gato generates their dataset using the built-in BabyAI bot, with more details that can be found in the paper.

The original repo is now being maintained under Farama as well as MiniGrid. In the 2023 update of the BabyAI repo, it discusses this change and also says:

"This repository still contains scripts which, if adapted to the Minigrid library, could be used to:

More info regarding minigrid can be found here: https://minigrid.farama.org/. There are both the original BabyAI environments and MiniGrid environments provided.

Tasks

As in issue https://github.com/ManifoldRG/NEKO/issues/13, requirement (1) is that environmets meet the Gymnasium API, this is already accomplished, as the Minigrid repo follows the new API.

The uncompleted task is sourcing a dataset, and porting it to Minari, requirement (2). There are several paths to sourcing a dataset:

1) Collect dataset manually, using the BabyAI bot, which may have to be adapted to meet the new Minigrid repo https://github.com/mila-iqia/babyai/blob/master/babyai/bot.py .

2) See if papers using Minigrid/BabyAI provide datasets, some papers can be found here: https://minigrid.farama.org/content/publications/ . In this case, a dataset just needs to be converted to Minari.

3) Collaborate with Minari on sourcing the dataset. In this repo, it says that more datasets are to come to Minari https://github.com/rodrigodelazcano/d4rl-minari-dataset-generation. At the end of the reame, it includes Minigrid. Potentially reach out to https://github.com/rodrigodelazcano, or others at Minari, discord can be found here: https://farama.org/

If interested, please add yourself to this issue, and discuss which path you are pursuing.

Output

The output should be a link to a GitHub repo that provides a process for acquiring the dataset as in https://github.com/daniellawson9999/data-tests.

harshsikka commented 1 year ago

Daniel, just wanted to say this is an excellent issue. Thank you for doing this!

helenlu66 commented 1 year ago

Currently working on approach 2 in this issue https://github.com/ManifoldRG/NEKO/issues/15

harshsikka commented 1 year ago

@helenlu66 I have moved this to "in progress" to reflect my understanding of your PPP

helenlu66 commented 1 year ago

Currently working on converting the GoToLocal expert trajectories .pkl to Minari in this issue

helenlu66 commented 1 year ago

currently resolving dependency issues between h5py and minari

helenlu66 commented 1 year ago

currently pursuing approach 1 here https://github.com/helenlu66/RLMinariDatasets/blob/master/babyai_bot_expert_data_generation.py since approach 2 only led me to find one dataset and porting that dataset to Minari is currently blocked due to unpicklingError

snat-s commented 8 months ago

Hi @helenlu66 could you solve this? I recently started researching this.

snat-s commented 7 months ago

I created another script to generate the dataset. It works appropietly and we are replicating this babyai dataset. @eihli and I are taking over this.