Closed khoderj closed 5 years ago
Thanks for your comment. I regret to say that there was a typo in the paper. The minus sign in the Gaussian part should be removed. We'd like to give the reward of 1.0 if the current zone temperature is exactly same the target center temperature as described in the paper.
The former gives a maximum reward of 1.0 at the temperature
center TiC, and the reward decreases quickly toward zero as the difference from
the center temperature increases.
The reward function written in the code is different from the one in the paper. The Gaussian part of the reward temperature is not multiplied by minus in the code which is not the case in the paper. As explained in the research paper, I think that the Gaussian part of the reward function should be positive. Thanks in advance.