HumanCompatibleAI / imitation

Clean PyTorch implementations of imitation and reward learning algorithms
https://imitation.readthedocs.io/
MIT License
1.27k stars 243 forks source link

Use huggingface dataset repositories for rollouts, not model repositories #651

Open ernestum opened 1 year ago

ernestum commented 1 year ago

Problem

We recently added the feature to load rollouts from HuggingFace hub here. The new function assumes that rollouts are stored in a file rollouts.npz in the repository of a model. Right now I see the following issues with this solution:

  1. [major] HuggingFace model repositories should be used for models and not for datasets. There are specific dataset repositories in the HuggingFace hub and I think we should use them as such.
  2. [minor] The function is located in the package imitation.policies.serialize while it has nothing to do with serializing policies.
  3. [minor] The function assumes a rollout.npz files is stored with the model. Placing it there is mostly a manual process which is very brittle. Training and uploading models is at least semi-automated by the zoo.

Related to this is #408 where we half decided to consistently name rollouts "trajectories".

Solution

  1. Move the rollouts to HuggingFace datasets and change the function to load them from there. I am not sure if .npz files are supported right now.
  2. Move it to something like imitation.data.serialize.
  3. Automate the generation and upload of the rollouts in a similar way to how the zoo automates training and upload.
AdamGleave commented 1 year ago

Thanks for flagging this! Didn't know about HuggingFace datasets, I agree we should switch to that.

Support automation in the form of "there's a command we can run to do this given a set of policies", but we may want to do quality control on the policies at least so probably can't be fully automated end-to-end.

ernestum commented 1 year ago

In the long run I would like to transition to make the best usage of the existing HuggingFace Dataset infrastructure but I realized that this would required some major changes to the imitation library. I broke it down into a number of smaller steps which in itself already provide an improvement over the status quo. Scroll down for a diagram to illustrate the steps.

Current State

Right now Experts generate list of TrajectoryWithRewards via the rollout() function. Those trajectories are fed to the imitation learning algorithms. To store the trajectories we concatenate them all to one long trajectory and store that alongside with the split positions in an numpy .npz file which is basically a dict of numpy arrays. This .npz file is manually uploaded to a HuggingFace Model Repository. There are functions to automatically retrieve it from there and load it.

Step 1: Using the HuggingFace Dataset Repository with the .npz format

A first step would be to move the .npz files HuggingFace Dataset Repositories. HuggingFace datasets are backed by Apache Arrow which does not natively support the .npz format as far as I understand. Therefore we need to provide a special dataset loading script in order to be compatible with the HuggingFace datasets library.

Advantages:

Step 2a: Switching to a file format supported by Apache Arrow

In the long run I don't want the imitation library to contain details like the file format for storing trajectories. To me this counts as "infrastructure" and as such should be upstreamed to either SB3 or the huggingface dataset library (I contacted @osanseviero from HuggingFace to discuss this). Towards this I would like to get rid of the numpy format and use something that Apache Arrow natively supports. I would also like to re-think the conversion between the HF Dataset object and our Trajectory format. Maybe there are smarter/simpler ways to solve this than concatenating/splitting.

Advantages

Step 2b Switching to a PyTorch DataLoader

In the past I already considered migrating all the algorithms to take PyTorch DataLoader objects as input and made some draft implementation for this. The PyTorch DataLoader could be fed from a HF Dataset (this conversion is already provided by HF), from an expert (for online learning) or from any other data source that is compatible with the PyTorch DataLoader.

Advantages

Step 3: Merge

Steps 2a and 2b can be executed independently. When we did them both we should be able to easily merge the changes.

Diagram of the steps

Green: Objects in RAM Blue: Objects on HD Yellow: Objects in Cloud RL-Datasets in Imitation

AdamGleave commented 1 year ago

Hi @ernestum thanks for looking into this and laying out such a clear plan.

I'm on board with step 1 and step 2(a). As you say worth thinking a bit about how to convert from Dataset to Trajectory, but ultimately I don't expect the conversion code to be that complex, so as long as we're not losing any information in the conversion then probably no need to overthink this.

Step 2(b) I'm more hesitant about. A DataLoader is much more abstract than our current transition types -- will this work with all our algorithms? MCEIRL.set_demonstrations currently has different logic depending on the transition type, and I don't see a way to push this just into a DataLoader without losing functionality. However, at first glance AIRL/GAIL/BC/DAgger do all seem to just make a data loader internally with base.make_data_loader so if it's just MCE IRL that's the odd-one out then I support this change (we can special case MCEIRL, it's an awkward algorithm in a bunch of ways).

ernestum commented 1 year ago

Talking to @osanseviero and @simoninithomas and some playing around with the HuggingFace datasets library revealed, that it is super easy to map between our current format and their datasets.

I now would propose the following steps/PRs:

1. Add basic support for HuggingFace datasets

2. Add extended support for HuggingFace datasets

3. Deprecate imitation.policies.serialize.load_rollouts_from_huggingface

4. Refactor the way we handle demonstrations inside of imitation Skimming over the code it looks like we spend way too much LOC on supporting and converting between different trajectory formats (with or without rewards, transitions, transitions with next_obs and dones). I have the vague hunch that there is a lot of potential to reduce complexity, LOC and even improve performance by using the HuggingFace datasets library.

AdamGleave commented 1 year ago

Talking to @osanseviero and @simoninithomas and some playing around with the HuggingFace datasets library revealed, that it is super easy to map between our current format and their datasets.

That's good news!

1. Add basic support for HuggingFace datasets

  • Makes imitation.types.load support huggingface datasets.
  • Make imitation.types.save store to a huggingface dataset by default.

This seems fine to me.

2. Add extended support for HuggingFace datasets

  • Ensure that imitation.types.<load/save> supports all of the weird action/observation spaces with unit tests (I think right now we only support discrete and box spaces).

Do any of our algorithms work with these "weird" action/observation spaces? If not I don't see the point of adding support at the type level at this point.

  • Write utility to sample trajectories from a model, record a video of some of them and upload everything to the HuggingFace hub with a model card showing the video and indicting from what model the trajectories originated from.

Good idea, it'd be nice to have a model card making it user friendly to browse different demos. There's some overlap here with existing eval_policy script.

3. Deprecate imitation.policies.serialize.load_rollouts_from_huggingface

  • Migrate all datasets currently stored in a model repo to their own dataset repo.
  • Remove imitation.policies.serialize.load_rollouts_from_huggingface.

This sounds good, much cleaner.

4. Refactor the way we handle demonstrations inside of imitation Skimming over the code it looks like we spend way too much LOC on supporting and converting between different trajectory formats (with or without rewards, transitions, transitions with next_obs and dones). I have the vague hunch that there is a lot of potential to reduce complexity, LOC and even improve performance by using the HuggingFace datasets library.

I don't currently have your intuition but probably best to just discuss once you've got a more fleshed out proposal there.

ernestum commented 1 year ago
  • Ensure that imitation.types.<load/save> supports all of the weird action/observation spaces with unit tests (I think right now we only support discrete and box spaces).

Do any of our algorithms work with these "weird" action/observation spaces? If not I don't see the point of adding support at the type level at this point

Good point. I don't know what spaces we support right now. Do we want to clarify this? Do we want to support all of them? I think when we support all spaces with the datasets then probably the algos will support them too?

AdamGleave commented 1 year ago

Good point. I don't know what spaces we support right now. Do we want to clarify this? Do we want to support all of them? I think when we support all spaces with the datasets then probably the algos will support them too?

Dict spaces I think will break most (all?) of our current algorithms as we assume observations/actions are NumPy arrays/tensors all over the place.

Provided it's an array/tensor I think we should be OK so long as underlying RL algorithm supports it.

ernestum commented 1 year ago

Do we properly on-hot encode discrete spaces? We could easily flatten/unflatten spaces using these util functions: https://github.com/Farama-Foundation/Gymnasium/blob/76559ed553cffc3e5082227b0a8acea461273e84/gymnasium/spaces/utils.py#L114

AdamGleave commented 1 year ago

Do we properly on-hot encode discrete spaces? We could easily flatten/unflatten spaces using these util functions: https://github.com/Farama-Foundation/Gymnasium/blob/76559ed553cffc3e5082227b0a8acea461273e84/gymnasium/spaces/utils.py#L114

I think the stable_baselines3.common.preprocessing.preprocess_obs function should take care of this.