I want to run IRL on a task with some expert demonstrations. The demonstrations are a bit old, and since then, the action space action has increased. For instance, in the first version of the task there were only 5 actions, whereas in the new version there are 3 new actions that can be taken.
Is it possible to train a reward net using the existing expert demonstrations (e.g. using AIRL) and then used the trained reward net to train a new policy considering the added actions? If so, I'm not entirely sure how it would look like when creating a RewardNet class.
Hello,
I want to run IRL on a task with some expert demonstrations. The demonstrations are a bit old, and since then, the action space action has increased. For instance, in the first version of the task there were only 5 actions, whereas in the new version there are 3 new actions that can be taken. Is it possible to train a reward net using the existing expert demonstrations (e.g. using AIRL) and then used the trained reward net to train a new policy considering the added actions? If so, I'm not entirely sure how it would look like when creating a
RewardNet
class.I would appreciate some help.
Thanks in advance.