avisingh599 / reward-learning-rl

[RSS 2019] End-to-End Robotic Reinforcement Learning without Reward Engineering
https://sites.google.com/view/reward-learning-rl/
Other
367 stars 68 forks source link

The conditions difference between training and evaluation in simulated tasks #25

Closed FudiFudi closed 4 years ago

FudiFudi commented 4 years ago

Hi. This is a question on the difference between training and evaluation.

In the normal RL simulations, there are training and evaluation. I know this two will have slight conditions difference to generalize the learning.

In the paper and code, I could not find the difference. So are there any conditions difference between training and evaluation in Visual Pusher task, Visual Door Opening task and Visual Picker task? If any, could you tell the difference and show me the part of code?

Regards,

FudiFudi

avisingh599 commented 4 years ago

Hi, the only difference between training and evaluation is that while we sample from a stochastic policy during training, we only use the mean from this policy for taking actions during evaluation.

Link to relevant lines in the code: https://github.com/avisingh599/reward-learning-rl/blob/8070d93e9379204f153e9044e03079bd9a354183/softlearning/algorithms/rl_algorithm.py#L282

FudiFudi commented 4 years ago

Thank you!