Easily editable reward functions

duckietown / gym-duckietown

Self-driving car simulator for the Duckietown universe

http://duckietown.org

Other

51 stars 19 forks source link

Easily editable reward functions #72

Closed jzilly closed 5 years ago

jzilly commented 6 years ago

The way the competitions are structured it would be interesting to allow participants to set the reward function themselves easily. This could be as simple as moving the computation of the reward into its own file. Not very high priority at the moment though.

fgolemo commented 6 years ago

Could you explain this a little more please? Why do think the participants should be able to edit the rewards (beyond the scope of simple addition or multiplication of the final value)?

jzilly commented 6 years ago

From my perspective it is not clear at all what the "right" reward function is if you choose to do RL. Therefore it should be easily editable by participants to fit their own needs. Since in the end they will be evaluated not by rewards but by objectives, making the reward fit the objectives would be a design task on its own.

maximecb commented 6 years ago

You’re absolutely right that the reward function matters when training with RL and tinkering with it might make sense. There’s already a fairly clean way to do that though, which is to write a gym reward wrapper.

jzilly commented 6 years ago

Nice, wrappers for everything. Sounds like exactly what we would need. So this issue could then address having such a wrapper.

fgolemo commented 6 years ago

Hmmmmmm, okay, I think I see where you're coming from. Maxime, I don't think reward wrappers would be optimal for this since they don't let you just edited the reward function. They just let you do simple math on top of the current reward afaik.

maximecb commented 6 years ago

You can totally intercept the reward value and replace it by something else if you want to, and not necessarily at every time step.

fgolemo commented 6 years ago

No, I know. But let's say the reward function is r = f(env)+g(env) where f and g are two functions that depend on the environment. Afaiu with a wrapper you can only do r_new = h(r) but not r_new = f(env) + 2 * g(env) + j(env), right?

fgolemo commented 6 years ago

Meaning... You can modify the reward value after it's been calculated by the environment. But you can't change the way the environment calculates it in the first place.

maximecb commented 6 years ago

Well, you can compute a whole new equivalent reward function, and you can choose when to use your reward function or the original one depending on the current state and actions taken. I don't think it gets much better than that, short of having some kind of scripting framework specific to reward functions, which would be pretty nasty IMO.

maximecb commented 6 years ago

One thing that could be done is to write some auxiliary function corresponding to specific terms in the simulator's reward function, so that those terms can be reused.