dgriff777 / rl_a3c_pytorch

A3C LSTM Atari with Pytorch plus A3G design
Apache License 2.0
562 stars 119 forks source link

Reward Smoothing #32

Closed WangChen100 closed 5 years ago

WangChen100 commented 5 years ago

Hi, How do you think about reward smoothing. The collected rewards have high variance. In order to show the tendency of reward curve, should we do some reward smoothing operation as same as tensorboard smoothing? If so, which smoothing method should I choose, exponential smoothing or average smoothing?

dgriff777 commented 5 years ago

Hi Are you referring to rewards that are collected by model when learning, in which case they are being clipped to be between -1 and 1, which reduces the variance? Or are you referring to the actual testing final reward numbers?

WangChen100 commented 5 years ago

Hi, thank you for your reply. Agent receives a sum of rewards in every episode. However this sum value has high variance, as shown in following figure. reward_total So is such a smoothing operation reasonable, as shown in following figure?