IBM / rl-testbed-for-energyplus

Reinforcement Learning Testbed for Power Consumption Optimization using EnergyPlus
MIT License
177 stars 74 forks source link

Question about the Total Power Consumption #46

Closed RuihangWang closed 3 years ago

RuihangWang commented 3 years ago

Hi, I found the comparisons are based on total power consumption. The total power consumption consists two part, i.e., the IT power and cooling power. However, the IT power is not fixed in the simulation trace, rendering the total power comparison meaning less. I am wondering If we can just use PUE (IT+Cooling / IT) as the metric to evaluate the performance of different controllers.

Another small issue is that in gym_energyplus.envs.energyplus_model.py Line 99

def set_action(self, normalized_action):
        # In TPRO/POP1/POP2 in baseline, action seems to be normalized to [-1.0, 1.0].
        # So it must be scaled back into action_space by the environment.
        self.action_prev = self.action
        self.action = self.action_space.low + (normalized_action + 1.) * 0.5 * (self.action_space.high - self.action_space.low)
        self.action = np.clip(self.action, self.action_space.low, self.action_space.high)

In fact, the action in TRPO is not normalised to [-1, 1]. It just relies on a Gaussian distribution (initially centered at 0 with std 1). Sometimes the sampled action from the Gaussian distribution can exceeds the bound [-1, 1]. The best implementation would clip them within [-1, 1] first and then denormalize it to the actual value.

antoine-galataud commented 3 years ago

Hi, I agree with your comment about action distribution with policy gradient methods like TRPO. However there is no practical difference in outcome if you clip first then denormalize, compared to current implementation which denormalize first then clip.

Here is an example:

import numpy as np

a_low = 2.0
a_high = 5.0

def set_action(normalized_action):
    denorm = a_low + (normalized_action + 1.) * 0.5 * (a_high - a_low)
    return np.clip(denorm, a_low, a_high)

def set_action2(normalized_action):
    normalized_action = np.clip(normalized_action, -1., 1.)
    return a_low + (normalized_action + 1.) * 0.5 * (a_high - a_low)

norm = np.random.normal(0., 1., int(1e5))
r1 = [set_action(v) for v in norm]
r2 = [set_action2(v) for v in norm]
assert r1 == r2

Maybe a comment change would be sufficient then?

RuihangWang commented 3 years ago

Thanks for your response.

Yes, it's true. The results are same. The minor concern is this comment "In TPRO/POP1/POP2 in baseline, action seems to be normalized to [-1.0, 1.0]." is a little misleading. People might think in TRPO this action has been normalised, but actually they are not. They are just sampled centered at 0 with std 1 and then clipped. The implementation is correct.

RuihangWang commented 3 years ago

What I actually concern is the calculation of power consumption. Since the implementation only controls variables of HAVC cooling system, i.e., the set point and air flow rate. I am wondering if it is fair to compare the total power consumption since it contains both IT power and cooling power.

When you check the eplusout.csv, you will find the IT load is not fixed. It is varying with respect to temperature. Therefore, I think it would be more fair to fix the IT power and compare only the cooling power or the PUE.

antoine-galataud commented 3 years ago

IT power is impacted by how much cooling is used. Basically, the colder the air, the less fan is used to cool CPUs and other equipment. So taking IT power into account is actually key.

RuihangWang commented 3 years ago

Thanks, I just realize that the IT fan power could be affected by the inlet temperature. If inlet temperature increases, the fan will work at a high speed, leading to more power consumption of IT power. I agree with that.