Closed kierad closed 1 year ago
My mistake - it looks like this is happening because the optimal policy is also hardcoded in Gridworld
. If I hardcode both the reward and an associated optimal policy e.g. like this:
def reward(self, state_int):
hardcoded_reward = [-1, -1, -1, 5, -5, -1, -1, -5,-1]
return hardcoded_reward[state_int]
def optimal_policy_deterministic(self, state_int):
hardcoded_policy = [1, 2, 2, 2, 2, 3, 3, 2, 3]
return hardcoded_policy[state_int]
...then we get a sane reward estimate:
Hello, thanks for sharing your code. I'm running
examples/lp_gridworld.py
and seeing this reward estimate, which looks good:However, when I change the body of
gridworld.reward
to e.g.:... then I see this reward estimate:
i.e.
linear_irl.irl
seems to assume that the 'goal state' is in the top right. Have I got something wrong? How can I get linear IRL to work with different goal states? Thanks.