Closed jzilly closed 5 years ago
Could you explain this a little more please? Why do think the participants should be able to edit the rewards (beyond the scope of simple addition or multiplication of the final value)?
From my perspective it is not clear at all what the "right" reward function is if you choose to do RL. Therefore it should be easily editable by participants to fit their own needs. Since in the end they will be evaluated not by rewards but by objectives, making the reward fit the objectives would be a design task on its own.
You’re absolutely right that the reward function matters when training with RL and tinkering with it might make sense. There’s already a fairly clean way to do that though, which is to write a gym reward wrapper.
Nice, wrappers for everything. Sounds like exactly what we would need. So this issue could then address having such a wrapper.
Hmmmmmm, okay, I think I see where you're coming from. Maxime, I don't think reward wrappers would be optimal for this since they don't let you just edited the reward function. They just let you do simple math on top of the current reward afaik.
You can totally intercept the reward value and replace it by something else if you want to, and not necessarily at every time step.
No, I know. But let's say the reward function is r = f(env)+g(env)
where f
and g
are two functions that depend on the environment. Afaiu with a wrapper you can only do r_new = h(r)
but not r_new = f(env) + 2 * g(env) + j(env)
, right?
Meaning... You can modify the reward value after it's been calculated by the environment. But you can't change the way the environment calculates it in the first place.
Well, you can compute a whole new equivalent reward function, and you can choose when to use your reward function or the original one depending on the current state and actions taken. I don't think it gets much better than that, short of having some kind of scripting framework specific to reward functions, which would be pretty nasty IMO.
One thing that could be done is to write some auxiliary function corresponding to specific terms in the simulator's reward function, so that those terms can be reused.
The way the competitions are structured it would be interesting to allow participants to set the reward function themselves easily. This could be as simple as moving the computation of the reward into its own file. Not very high priority at the moment though.