IBM / rl-testbed-for-energyplus

Reinforcement Learning Testbed for Power Consumption Optimization using EnergyPlus
MIT License
191 stars 77 forks source link

Analyzing the reason behind the decrease in the energy consumption #14

Open khoderj opened 5 years ago

khoderj commented 5 years ago

Hi, The results mentioned in the paper present in general the overall power consumption and the thermal comfort without checking the major elements leading to the decrease in the energy consumption. According to the reinforcement learning model presented, we should expect a decrease in the HVAC power consumption that will lead at the end to the decrease in the total electric demand since the actions (action space) taken are all related to HVAC setpoints temperature adjusting. However, when interpreting the results, I found that the decrease in the energy consumption is related to the decrease in the IT electric power consumption and not the HVAC power consumption which increases a little bit.

I would like to know if you have some figures comparing the HVAC power consumption as well as comparing the ITE power consumption in the first and last episodes. Moreover, I couldn’t figure out the reason behind the decrease in the ITE power consumption since I think it should be the same as the baseline and through all the episodes. I have analyzed in addition the power consumption of the ITE components: ITE_CPU, ITE_FAN, and ITE_UPS that show a decrease in the power consumption. Looking at the idf file, there is an object called: ElectricEquipment:ITE:AirCooled, does this mean that the ITE is connected to the HVAC and thus the actions taken affect also the ITE power consumption? If the answer is yes, how could we prevent this from occurring? Finally, it would be good to have a figure comparing the values of the controlled setpoints between the baseline and reinforcement learning models.

Below are some figures illustrating the problem (Model used is the temperature model with the weather file CA): 0 means first episode, and 44 here is the last episode

image

image

image

Thank you in advance for your help.

Best Regards, Khoder Jneid

myndtrust commented 5 years ago

The ITE has a CPU loading schedule that is inlet temperature dependent. Could your agent have learned to supply cooler air? You could adjust the CPU loading schedules (or omit it). EnergyPlus only looks at the CPU loading, modern data-center's consist of many other heat generating components too, I/O, memory, SSD, etc these are not characterized in the model - so the CPU loading schedule approach seems to give a false sense of accuracy in this context.

Question: How do you control the number of episodes? I am struggling to translate the number of timesteps_per_batch to episodes.

khoderj commented 5 years ago

Thanks for your reply. A hint of the answer of your question could be found in the following issue: "Could training at system timestep frequency be avoided". Anyway, I will write my interpretation. Two types of timesteps exist in Energyplus: zone timestep (defined in the energyplus model file; 4 = 15 min) and system timestep. The system timestep frequency ranges from 1 per minute to 1 per zone timestep (1 per 15 min), and so far it is not controlled in the IBM code. The communication between the agent and the environment occurs according to system timestep and not zone timestep. Thus, in the worst case the agent will communicate 60 times with the environment per hour and thus (60 24 365) times per year. To get the number of iterations per episode, we divide the number of communications per year by the number of timesteps per batch (161024; defined in the run_energyplus code) that will result in 32 iterations per episode, so 1 episode in the worst case equals (60 24 365) timesteps = 32 iterations. As I mentioned, the system timestep is not controlled and thus the number of iterations per episode is not fixed. As a result, the number of iterations per episode ranges from (4 (zone timestep) 24 365)/(16 1024) to (60 24 365)/(16 * 1024)

myndtrust commented 5 years ago

Sorry to diverge I bit from the original topic.

Your summary makes sense and I now understand that an episode is the number of batches of communication points per year.

This would mean that the range of iterations if from 2 to 32 per episode. When I run the model with default settings, I see several hundred iterations per episode (400 to 600 hundred).

Also several episodes into the run, the program exits with errors noted below;

Program Version,EnergyPlus, Version 8.8.0-7c3bbe4830, YMD=2019.05.09 07:17,IDD_Version 8.8.0 ***** Beginning Zone Sizing Calculations Warning Weather file location will be used rather than entered (IDF) Location object. ~~~ ..Location object=CHICAGO_IL_USA TMY2-94846 ~~~ ..Weather File Location=San Francisco Intl Ap CA USA TMY3 WMO#=724940 ~~~ ..due to location differences, Latitude difference=[4.16] degrees, Longitude difference=[34.65] degrees. ~~~ ..Time Zone difference=[2.0] hour(s), Elevation difference=[98.95] percent, [188.00] meters.