Robosuite environment reward function

ARISE-Initiative / robosuite

robosuite: A Modular Simulation Framework and Benchmark for Robot Learning

Other

1.37k stars 429 forks source link

For hindsight experience replay use cases, one needs a way to compute reward for the "hallucinated" new goal. This calculation will need not only action but also current state. In some of the reward() implementations I have seen (e.g. Lift), the reward() only takes action as input. Could we have something similar to OpenAI's compute_reward() function?

Another issue is the sparse reward in Lift. the reaching_reward is computed by 1 - np.tanh(10.0 * dist), but this has range [0, 2] because range of tanh is [-1, 1]. Should this be updated?

Hi @courseprojects ,

You're correct -- reward() only takes action explicitly, but because the reward() funciton is part of the environment, it already has access to the entire state -- so hopefully this shouldn't be a problem for you if you choose to extend the reward function. If you do end up creating a generalizable hindsight reward function, we'd be apply to merge it into our repo -- please see our contributions guide if so!

You're also correct that tanh has a range of [-1, 1]. However, our distance values are always positive, so our effective range is [0, 1].

Hope that helps! Closing this issue for now, but feel free to re-open it if you feel that there are still some unanswered questions.

ARISE-Initiative / robosuite

Robosuite environment reward function #130