MatthewJA / Inverse-Reinforcement-Learning

Implementations of selected inverse reinforcement learning algorithms.
MIT License
971 stars 236 forks source link

Unexpected reward estimate #17

Closed kierad closed 1 year ago

kierad commented 1 year ago

Hello, thanks for sharing your code. I'm running examples/lp_gridworld.py and seeing this reward estimate, which looks good:

Screenshot 2023-06-13 at 19 21 51

However, when I change the body of gridworld.reward to e.g.:

    def reward(self, state_int):
        if state_int == 2:  # Goal state now in bottom right of 3x3, not top right
            return 1
        return 0

... then I see this reward estimate:

Screenshot 2023-06-13 at 19 26 05

i.e. linear_irl.irl seems to assume that the 'goal state' is in the top right. Have I got something wrong? How can I get linear IRL to work with different goal states? Thanks.

kierad commented 1 year ago

My mistake - it looks like this is happening because the optimal policy is also hardcoded in Gridworld. If I hardcode both the reward and an associated optimal policy e.g. like this:

    def reward(self, state_int):
        hardcoded_reward = [-1, -1, -1, 5, -5, -1, -1, -5,-1]
        return hardcoded_reward[state_int]

    def optimal_policy_deterministic(self, state_int):
        hardcoded_policy = [1, 2, 2, 2, 2, 3, 3, 2, 3]
        return hardcoded_policy[state_int]

...then we get a sane reward estimate:

Screenshot 2023-06-14 at 19 14 54