Open ernestum opened 1 year ago
Thanks for flagging this! Didn't know about HuggingFace datasets, I agree we should switch to that.
Support automation in the form of "there's a command we can run to do this given a set of policies", but we may want to do quality control on the policies at least so probably can't be fully automated end-to-end.
In the long run I would like to transition to make the best usage of the existing HuggingFace Dataset infrastructure but I realized that this would required some major changes to the imitation
library. I broke it down into a number of smaller steps which in itself already provide an improvement over the status quo. Scroll down for a diagram to illustrate the steps.
Right now Experts generate list of TrajectoryWithReward
s via the rollout()
function. Those trajectories are fed to the imitation learning algorithms. To store the trajectories we concatenate them all to one long trajectory and store that alongside with the split positions in an numpy .npz
file which is basically a dict of numpy arrays.
This .npz
file is manually uploaded to a HuggingFace Model Repository. There are functions to automatically retrieve it from there and load it.
A first step would be to move the .npz
files HuggingFace Dataset Repositories.
HuggingFace datasets are backed by Apache Arrow which does not natively support the .npz
format as far as I understand. Therefore we need to provide a special dataset loading script in order to be compatible with the HuggingFace datasets library.
load_rollouts_from_huggingface()
function. No changes downstream of that.In the long run I don't want the imitation library to contain details like the file format for storing trajectories. To me this counts as "infrastructure" and as such should be upstreamed to either SB3 or the huggingface dataset library (I contacted @osanseviero from HuggingFace to discuss this).
Towards this I would like to get rid of the numpy format and use something that Apache Arrow natively supports. I would also like to re-think the conversion between the HF Dataset
object and our Trajectory format. Maybe there are smarter/simpler ways to solve this than concatenating/splitting.
In the past I already considered migrating all the algorithms to take PyTorch DataLoader
objects as input and made some draft implementation for this.
The PyTorch DataLoader
could be fed from a HF Dataset
(this conversion is already provided by HF), from an expert (for online learning) or from any other data source that is compatible with the PyTorch DataLoader
.
DataLoader
interface opens up lots of possibilities to load data from without introducing any other dependency.DataLoader
makes use of parallelizationSteps 2a and 2b can be executed independently. When we did them both we should be able to easily merge the changes.
Green: Objects in RAM Blue: Objects on HD Yellow: Objects in Cloud
Hi @ernestum thanks for looking into this and laying out such a clear plan.
I'm on board with step 1 and step 2(a). As you say worth thinking a bit about how to convert from Dataset to Trajectory, but ultimately I don't expect the conversion code to be that complex, so as long as we're not losing any information in the conversion then probably no need to overthink this.
Step 2(b) I'm more hesitant about. A DataLoader
is much more abstract than our current transition types -- will this work with all our algorithms? MCEIRL.set_demonstrations
currently has different logic depending on the transition type, and I don't see a way to push this just into a DataLoader
without losing functionality. However, at first glance AIRL/GAIL/BC/DAgger do all seem to just make a data loader internally with base.make_data_loader
so if it's just MCE IRL that's the odd-one out then I support this change (we can special case MCEIRL, it's an awkward algorithm in a bunch of ways).
Talking to @osanseviero and @simoninithomas and some playing around with the HuggingFace datasets
library revealed, that it is super easy to map between our current format and their datasets.
I now would propose the following steps/PRs:
1. Add basic support for HuggingFace datasets
imitation.types.load
support huggingface datasets.imitation.types.save
store to a huggingface dataset by default.2. Add extended support for HuggingFace datasets
imitation.types.<load/save>
supports all of the weird action/observation spaces with unit tests (I think right now we only support discrete and box spaces).3. Deprecate imitation.policies.serialize.load_rollouts_from_huggingface
imitation.policies.serialize.load_rollouts_from_huggingface
.4. Refactor the way we handle demonstrations inside of imitation
Skimming over the code it looks like we spend way too much LOC on supporting and converting between different trajectory formats (with or without rewards, transitions, transitions with next_obs and dones). I have the vague hunch that there is a lot of potential to reduce complexity, LOC and even improve performance by using the HuggingFace datasets
library.
Talking to @osanseviero and @simoninithomas and some playing around with the HuggingFace
datasets
library revealed, that it is super easy to map between our current format and their datasets.
That's good news!
1. Add basic support for HuggingFace
datasets
- Makes
imitation.types.load
support huggingface datasets.- Make
imitation.types.save
store to a huggingface dataset by default.
This seems fine to me.
2. Add extended support for HuggingFace
datasets
- Ensure that
imitation.types.<load/save>
supports all of the weird action/observation spaces with unit tests (I think right now we only support discrete and box spaces).
Do any of our algorithms work with these "weird" action/observation spaces? If not I don't see the point of adding support at the type level at this point.
- Write utility to sample trajectories from a model, record a video of some of them and upload everything to the HuggingFace hub with a model card showing the video and indicting from what model the trajectories originated from.
Good idea, it'd be nice to have a model card making it user friendly to browse different demos. There's some overlap here with existing eval_policy
script.
3. Deprecate
imitation.policies.serialize.load_rollouts_from_huggingface
- Migrate all datasets currently stored in a model repo to their own dataset repo.
- Remove
imitation.policies.serialize.load_rollouts_from_huggingface
.
This sounds good, much cleaner.
4. Refactor the way we handle demonstrations inside of
imitation
Skimming over the code it looks like we spend way too much LOC on supporting and converting between different trajectory formats (with or without rewards, transitions, transitions with next_obs and dones). I have the vague hunch that there is a lot of potential to reduce complexity, LOC and even improve performance by using the HuggingFacedatasets
library.
I don't currently have your intuition but probably best to just discuss once you've got a more fleshed out proposal there.
- Ensure that
imitation.types.<load/save>
supports all of the weird action/observation spaces with unit tests (I think right now we only support discrete and box spaces).Do any of our algorithms work with these "weird" action/observation spaces? If not I don't see the point of adding support at the type level at this point
Good point. I don't know what spaces we support right now. Do we want to clarify this? Do we want to support all of them? I think when we support all spaces with the datasets then probably the algos will support them too?
Good point. I don't know what spaces we support right now. Do we want to clarify this? Do we want to support all of them? I think when we support all spaces with the datasets then probably the algos will support them too?
Dict spaces I think will break most (all?) of our current algorithms as we assume observations/actions are NumPy arrays/tensors all over the place.
Provided it's an array/tensor I think we should be OK so long as underlying RL algorithm supports it.
Do we properly on-hot encode discrete spaces? We could easily flatten/unflatten spaces using these util functions: https://github.com/Farama-Foundation/Gymnasium/blob/76559ed553cffc3e5082227b0a8acea461273e84/gymnasium/spaces/utils.py#L114
Do we properly on-hot encode discrete spaces? We could easily flatten/unflatten spaces using these util functions: https://github.com/Farama-Foundation/Gymnasium/blob/76559ed553cffc3e5082227b0a8acea461273e84/gymnasium/spaces/utils.py#L114
I think the stable_baselines3.common.preprocessing.preprocess_obs
function should take care of this.
Problem
We recently added the feature to load rollouts from HuggingFace hub here. The new function assumes that rollouts are stored in a file
rollouts.npz
in the repository of a model. Right now I see the following issues with this solution:imitation.policies.serialize
while it has nothing to do with serializing policies.rollout.npz
files is stored with the model. Placing it there is mostly a manual process which is very brittle. Training and uploading models is at least semi-automated by the zoo.Related to this is #408 where we half decided to consistently name rollouts "trajectories".
Solution
.npz
files are supported right now.imitation.data.serialize
.