IBM / rl-testbed-for-energyplus

Reinforcement Learning Testbed for Power Consumption Optimization using EnergyPlus
MIT License
186 stars 77 forks source link

Reward Function #15

Closed khoderj closed 5 years ago

khoderj commented 5 years ago

The reward function written in the code is different from the one in the paper. The Gaussian part of the reward temperature is not multiplied by minus in the code which is not the case in the paper. As explained in the research paper, I think that the Gaussian part of the reward function should be positive. Thanks in advance.

takaomoriyama commented 5 years ago

Thanks for your comment. I regret to say that there was a typo in the paper. The minus sign in the Gaussian part should be removed. We'd like to give the reward of 1.0 if the current zone temperature is exactly same the target center temperature as described in the paper.

The former gives a maximum reward of 1.0 at the temperature
center TiC, and the reward decreases quickly toward zero as the difference from
the center temperature increases.