Unexpected reward estimate

MatthewJA / Inverse-Reinforcement-Learning

Implementations of selected inverse reinforcement learning algorithms.

MIT License

971 stars 236 forks source link

Hello, thanks for sharing your code. I'm running examples/lp_gridworld.py and seeing this reward estimate, which looks good:

Screenshot 2023-06-13 at 19 21 51

However, when I change the body of gridworld.reward to e.g.:

    def reward(self, state_int):
        if state_int == 2:  # Goal state now in bottom right of 3x3, not top right
            return 1
        return 0

... then I see this reward estimate:

Screenshot 2023-06-13 at 19 26 05

i.e. linear_irl.irl seems to assume that the 'goal state' is in the top right. Have I got something wrong? How can I get linear IRL to work with different goal states? Thanks.

def reward(self, state_int): hardcoded_reward = [-1, -1, -1, 5, -5, -1, -1, -5,-1] return hardcoded_reward[state_int] def optimal_policy_deterministic(self, state_int): hardcoded_policy = [1, 2, 2, 2, 2, 3, 3, 2, 3] return hardcoded_policy[state_int]

MatthewJA / Inverse-Reinforcement-Learning

Unexpected reward estimate #17