HumanCompatibleAI / imitation

Clean PyTorch implementations of imitation and reward learning algorithms
https://imitation.readthedocs.io/
MIT License
1.33k stars 248 forks source link

[Question] Reward net transfer #838

Open risufaj opened 9 months ago

risufaj commented 9 months ago

Hello,

I want to run IRL on a task with some expert demonstrations. The demonstrations are a bit old, and since then, the action space action has increased. For instance, in the first version of the task there were only 5 actions, whereas in the new version there are 3 new actions that can be taken. Is it possible to train a reward net using the existing expert demonstrations (e.g. using AIRL) and then used the trained reward net to train a new policy considering the added actions? If so, I'm not entirely sure how it would look like when creating a RewardNet class.

I would appreciate some help.

Thanks in advance.

rizqisubeno commented 2 months ago

I think you can use again reward net again to train a new policy with added actions as long as you use a state-only parameter on the reward net