Closed WangChen100 closed 5 years ago
Hi Are you referring to rewards that are collected by model when learning, in which case they are being clipped to be between -1 and 1, which reduces the variance? Or are you referring to the actual testing final reward numbers?
Hi, thank you for your reply. Agent receives a sum of rewards in every episode. However this sum value has high variance, as shown in following figure. So is such a smoothing operation reasonable, as shown in following figure?
Hi, How do you think about reward smoothing. The collected rewards have high variance. In order to show the tendency of reward curve, should we do some reward smoothing operation as same as tensorboard smoothing? If so, which smoothing method should I choose, exponential smoothing or average smoothing?