HumanCompatibleAI / imitation

Clean PyTorch implementations of imitation and reward learning algorithms
https://imitation.readthedocs.io/
MIT License
1.3k stars 247 forks source link

Support for Gymnasium new spaces type; And another question when applied GAIL/AIRL #687

Open wsdd2 opened 1 year ago

wsdd2 commented 1 year ago

Problem 1 Description

I am using Gymnasium to make a custom environment instead of gym. In my project, I'm using gymnasium.spaces.Sequence as my observation space, which seems not compatible with imitation package when training agent using GAIL or AIRL. This type of space are sets of finite-length sequences because my agents will observe unspecific numbers of obstacles when perform an action, each sets will include n numbers of np.arrays that represent the obstacles' coordinates it observes while n is not fixed. It seems imitation can't use gymnasium.spaces.Sequence in expert & agent trajectories. Hope it will be updated.

Problem 2 Description

As mentioned above, I have a project to train some agents performing expert policies. However, my dataset only contains the expert trajectories but not the expert policy. In other words, my dataset only contains about 5000+ action-state trajectories, how can I learn a policy before using GAIL/AIRL to train an agent? Because the agent-training tutorials in the imitation docs all tell me to generate trajectories by rollout method from expert policy before training agent. Are there any solutions to skip that part and make my custom dataset trainable?

ernestum commented 1 year ago
  1. Even if we switched to gymnasium one day (which we will certainly do), we don't support all action/observation spaces yet (such as dict spaces see #681). We are happy about any contributions to this!

  2. Of course you don't have to generate the demonstrations. This is just done in the examples to make them more self-contained. You can pass your trajectories as a sequence of imitation.data.types.Trajectory to GAIL/AIRL.

In the future pleas consider opening multiple issues when there are multiple topics to address. That helps us to attack them one at a time :-)