Zone air temperature fluctuates a lot after 200 episodes

WynnCJF commented 2 years ago

Hello! I used the original code in baseline docker to train the RL model with command: python3 -m baselines_energyplus.trpo_mpi.run_energyplus --num-timesteps 10000000. The model was trained for 200 episodes and saved and applied for inference with the same idf file and weather file. The west zone temperature during inference fluctuates a lot (annual mean temperature is about 22.4 degree), as shown in the graph below:

Also, the set point temperature set by action is almost always at the lowest value possible:

Is this the expected behavior for the episode?

WynnCJF commented 2 years ago

Here's the statistical information of episode 215:

Reward                    ave=-0.48, min=-3.77, max= 1.36, std= 0.96
westzone_temp             ave=22.02, min=15.26, max=35.89, std= 2.65
eastzone_temp             ave=22.15, min=16.41, max=34.43, std= 2.35
Power consumption         ave=102,383.14, min=40,710.74, max=175,918.97, std=27,053.27
pue                       ave= 1.31, min= 1.02, max= 2.06, std= 0.16
westzone_temp distribution
    degree 0.0-0.9 0.0   0.1   0.2   0.3   0.4   0.5   0.6   0.7   0.8   0.9
    -------------------------------------------------------------------------
    17.0C  1.6%    0.2%  0.2%  0.2%  0.1%  0.1%  0.1%  0.1%  0.1%  0.2%  0.2%
    18.0C  3.4%    0.2%  0.2%  0.3%  0.3%  0.3%  0.4%  0.4%  0.4%  0.4%  0.5%
    19.0C  5.4%    0.5%  0.4%  0.5%  0.5%  0.4%  0.5%  0.6%  0.6%  0.7%  0.8%
    20.0C 15.3%    1.0%  1.1%  1.3%  1.5%  1.6%  1.7%  1.8%  1.7%  1.7%  1.8%
    21.0C 12.7%    1.7%  1.6%  1.5%  1.5%  1.3%  1.1%  1.0%  1.0%  1.0%  1.0%
    22.0C 12.5%    0.9%  0.9%  1.0%  1.0%  1.0%  1.2%  1.3%  1.6%  1.7%  1.9%
    23.0C 22.5%    1.9%  2.1%  2.4%  2.4%  2.4%  2.3%  2.3%  2.4%  2.2%  2.0%
    24.0C 13.2%    2.1%  1.9%  1.7%  1.6%  1.4%  1.1%  1.0%  1.0%  0.9%  0.7%
    25.0C  3.1%    0.5%  0.5%  0.4%  0.4%  0.3%  0.3%  0.2%  0.2%  0.2%  0.1%
    26.0C  0.9%    0.1%  0.1%  0.1%  0.1%  0.1%  0.1%  0.0%  0.1%  0.1%  0.1%
    27.0C  0.6%    0.1%  0.1%  0.1%  0.1%  0.0%  0.1%  0.0%  0.1%  0.1%  0.1%

antoine-galataud commented 2 years ago

Hi @WynnCJF, do you get the same during training or is it only during inference? Training and inference should show similar results if inference was performed on a reloaded policy, assuming that reload works as expected.

Also note: depending on how you perform inference, you may want to make predictions deterministic and no longer stochastic (keeping the mode of the outputted action probability distribution - ie for present continuous case with gaussian, that means keeping mean and ditching standard deviation). Not sure how baselines handles that though.

WynnCJF commented 2 years ago

Hi @antoine-galataud. I saved the model checkpoints and reloaded them using tf.train.Checkpoint, and I compared the eplusout.csv generated by EnergyPlus during training and inference. Yes the results are quite similar. The statistical information in your README file shows the result of episode 362, which is considerably larger than the 200 episodes I ran. Do you think my issue results from insufficient training? Thanks for the note on deterministic predictions! We'll look into that.

antoine-galataud commented 2 years ago

Do you see training stats improving even at episode 200? If yes then training longer could help.

You can check if episode reward mean is still improving, as well as temperature and power demand. Something that would be worth tracking too (but not accessible easily with baselines) are policy and value function losses, as well as entropy (near 0 if policy becomes deterministic).

xingjian-zhang commented 2 years ago

Hi @antoine-galataud. Thanks for your help! However, the training results at episode 362 are still unstable even if I use the original docker image without changing any code:

Reward                    ave=-0.44, min=-3.95, max= 1.42, std= 0.96
westzone_temp             ave=21.96, min=15.29, max=36.78, std= 2.56
eastzone_temp             ave=22.15, min=16.45, max=33.80, std= 2.26
Power consumption         ave=102,154.61, min=40,686.45, max=178,973.00, std=26,960.15
pue                       ave= 1.31, min= 1.02, max= 2.05, std= 0.15
westzone_temp distribution
    degree 0.0-0.9 0.0   0.1   0.2   0.3   0.4   0.5   0.6   0.7   0.8   0.9
    -------------------------------------------------------------------------
    17.0C  1.5%    0.2%  0.2%  0.1%  0.1%  0.1%  0.1%  0.1%  0.1%  0.2%  0.2%
    18.0C  3.1%    0.2%  0.2%  0.2%  0.3%  0.3%  0.3%  0.3%  0.4%  0.4%  0.5%
    19.0C  5.2%    0.5%  0.5%  0.5%  0.5%  0.4%  0.5%  0.5%  0.5%  0.6%  0.7%
    20.0C 16.6%    0.9%  1.0%  1.3%  1.5%  1.7%  1.9%  2.1%  2.1%  2.1%  2.0%
    21.0C 12.3%    1.9%  1.8%  1.6%  1.3%  1.2%  1.1%  0.9%  0.9%  0.7%  0.8%
    22.0C 11.4%    0.8%  0.8%  0.8%  0.9%  1.0%  1.1%  1.2%  1.5%  1.5%  1.8%
    23.0C 24.5%    2.0%  2.2%  2.4%  2.3%  2.6%  2.7%  2.6%  2.7%  2.6%  2.4%
    24.0C 13.4%    2.4%  2.0%  1.9%  1.6%  1.5%  1.1%  0.9%  0.8%  0.6%  0.5%
    25.0C  2.1%    0.4%  0.3%  0.3%  0.2%  0.2%  0.2%  0.2%  0.1%  0.1%  0.1%
    26.0C  0.7%    0.1%  0.1%  0.1%  0.1%  0.1%  0.1%  0.0%  0.0%  0.0%  0.1%
    27.0C  0.5%    0.1%  0.0%  0.1%  0.0%  0.0%  0.0%  0.1%  0.0%  0.0%  0.1%

Not sure why I cannot reproduce the results in README. 😕

antoine-galataud commented 2 years ago

@xingjian-zhang I've checked the orginal paper and these indoor temperature fluctuations are expected when training only on USA_CA_San.Francisco.Intl.AP.724940_TMY3.epw weather file. See Simulation Results (Section 5.2), Figure 4.

If you want to get a better chance at reproducing paper's results, you can edit ~/.bashrc_eplus and uncomment the relevant export ENERGYPLUS_WEATHER line (last one of the group). Then don't forget to source ~/.bashrc_eplus before launching a new training.

xingjian-zhang commented 2 years ago

Thanks for the suggestion! I will check on that. 🚀

WynnCJF commented 2 years ago

Hi @antoine-galataud! We followed your advice and changed ENERGYPLUS_WEATHER to more weather files:

root@0366a70a6b20:~/rl-testbed-for-energyplus# echo $ENERGYPLUS_WEATHER 
/usr/local/EnergyPlus-22-1-0/WeatherData/USA_CA_San.Francisco.Intl.AP.724940_TMY3.epw,/usr/local/EnergyPlus-22-1-0/WeatherData/USA_CO_Golden-NREL.724666_TMY3.epw,/usr/local/EnergyPlus-22-1-0/WeatherData/USA_FL_Tampa.Intl.AP.722110_TMY3.epw

We left the other code unchanged in the original docker image and trained for around 500 episodes. Yet the room temperature still seems to fluctuate:

energyplus_model.plot csv=/root/eplog/openai-2022-07-30-04-36-07-288085/output/episode-00000500-08436/eplusout.csv.gz
read_episode: file=/root/eplog/openai-2022-07-30-04-36-07-288085/output/episode-00000500-08436/eplusout.csv.gz
episode /root/eplog/openai-2022-07-30-04-36-07-288085/output/episode-00000500-08436/eplusout.csv.gz
read_episode: file=/root/eplog/openai-2022-07-30-04-36-07-288085/output/episode-00000500-08436/eplusout.csv.gz
Reward                    ave=-0.76, min=-2.25, max= 1.03, std= 0.89
westzone_temp             ave=21.09, min=15.51, max=28.85, std= 2.78
eastzone_temp             ave=21.79, min=16.87, max=30.25, std= 2.46
Power consumption         ave=109,730.73, min=42,398.08, max=147,799.32, std=28,855.72
pue                       ave= 1.50, min= 1.03, max= 1.96, std= 0.16
westzone_temp distribution
    degree 0.0-0.9 0.0   0.1   0.2   0.3   0.4   0.5   0.6   0.7   0.8   0.9
    -------------------------------------------------------------------------
    17.0C 10.4%    2.2%  2.2%  1.7%  1.3%  0.9%  0.6%  0.5%  0.4%  0.4%  0.3%
    18.0C  2.9%    0.4%  0.3%  0.2%  0.2%  0.2%  0.2%  0.2%  0.4%  0.3%  0.4%
    19.0C  4.8%    0.4%  0.3%  0.4%  0.4%  0.5%  0.4%  0.6%  0.6%  0.6%  0.7%
    20.0C  9.9%    0.6%  0.6%  0.7%  0.8%  0.9%  1.1%  1.2%  1.3%  1.3%  1.5%
    21.0C 16.4%    1.8%  2.0%  2.3%  2.5%  2.4%  2.1%  1.3%  1.0%  0.6%  0.4%
    22.0C  5.0%    0.4%  0.4%  0.4%  0.4%  0.4%  0.5%  0.5%  0.6%  0.7%  0.8%
    23.0C 20.2%    1.0%  1.4%  1.8%  2.0%  2.1%  2.1%  2.1%  2.3%  2.6%  2.8%
    24.0C 17.2%    2.7%  2.9%  3.0%  2.9%  2.5%  2.0%  1.0%  0.3%  0.1%  0.0%
    25.0C  0.0%    0.0%  0.0%  0.0%  0.0%  0.0%  0.0%  0.0%  0.0%  0.0%  0.0%
    26.0C  0.0%    0.0%  0.0%  0.0%  0.0%  0.0%  0.0%  0.0%  0.0%  0.0%  0.0%
    27.0C  0.0%    0.0%  0.0%  0.0%  0.0%  0.0%  0.0%  0.0%  0.0%  0.0%  0.0%

Moreover, we observed that the setpoint temperature in the action during inference is always at its minimum (10 Celsius degree), and the air mass flow rate is always very close to about 3.5. This means the actual action values are almost fixed. Is there any way to fix this problem? By the way I noticed that the EnergyPlus version in ~/.bashrc shown in README is 8-8-0, while the version in docker image is 22-1-0. Does it have anything to do with our problem? Thanks!

antoine-galataud commented 2 years ago

That requires more debugging indeed. EnergyPlus version could be one factor, not very likely though. Another thing that changed is openai baselines version: the current master branch uses one that is compatible with tensorflow 2. The original version used tf1. You could try to build an env with original dependencies versions to test, that should be too hard. You can also try to tweak the reward to put more weight on temperature constraint, but that would deviate from original paper.

antoine-galataud commented 2 years ago

@WynnCJF @xingjian-zhang I managed to get stable zone air temperature in west zone using model 2ZoneDataCenterHVAC_wEconomizer_Temp.idf instead of 2ZoneDataCenterHVAC_wEconomizer_Temp_Fan.idf. This can be set in bashrc_eplus too (last lines). Note that original paper doesn't indicate which model was used to obtain presented results.

Here is what I obtain on first episode:

>>> df = pd.read_csv("/root/eplog/openai-2022-08-01-19-23-21-347401/output/episode-00000000-00481/eplusout.csv.gz")
>>> df["WEST ZONE:Zone Air Temperature [C](TimeStep)"].describe()
count    35040.000000
mean        36.016728
std          8.386905
min         21.128363
25%         29.051563
50%         35.561136
75%         42.257246
max         58.404368
Name: WEST ZONE:Zone Air Temperature [C](TimeStep), dtype: float64

Then on episode 20:

>>> df = pd.read_csv("/root/eplog/openai-2022-08-01-19-23-21-347401/output/episode-00000020-00481/eplusout.csv.gz")
>>> df["WEST ZONE:Zone Air Temperature [C](TimeStep)"].describe()
count    35040.000000
mean        22.677554
std          1.066780
min         18.919097
25%         22.978730
50%         22.999801
75%         23.000150
max         36.485794

Using python3 -m common.plot_energyplus:

read_episode: file=/root/eplog/openai-2022-08-01-19-23-21-347401/output/episode-00000021-00481/eplusout.csv.gz
Reward                    ave=-0.19, min=-2.56, max= 1.49, std= 0.57
westzone_temp             ave=22.47, min=18.90, max=35.12, std= 1.21
eastzone_temp             ave=22.29, min=19.48, max=39.18, std= 1.55
Power consumption         ave=103,788.51, min=46,255.23, max=170,448.48, std=27,558.03
pue                       ave= 1.30, min= 1.02, max= 1.77, std= 0.14
westzone_temp distribution
    degree 0.0-0.9 0.0   0.1   0.2   0.3   0.4   0.5   0.6   0.7   0.8   0.9
    -------------------------------------------------------------------------
    17.0C  0.0%    0.0%  0.0%  0.0%  0.0%  0.0%  0.0%  0.0%  0.0%  0.0%  0.0%
    18.0C  0.0%    0.0%  0.0%  0.0%  0.0%  0.0%  0.0%  0.0%  0.0%  0.0%  0.0%
    19.0C  6.1%    0.1%  0.3%  0.4%  0.5%  0.6%  0.7%  0.7%  0.9%  1.0%  1.0%
    20.0C 11.9%    1.0%  1.2%  1.3%  1.4%  1.5%  1.3%  1.2%  1.1%  1.0%  0.8%
    21.0C  4.1%    0.7%  0.6%  0.5%  0.5%  0.4%  0.3%  0.2%  0.3%  0.2%  0.2%
    22.0C 40.3%    0.3%  0.3%  0.3%  0.3%  0.3%  0.3%  0.4%  0.6%  0.8% 36.8%
    23.0C 35.6%   32.2%  0.9%  0.6%  0.5%  0.3%  0.3%  0.2%  0.2%  0.2%  0.2%
    24.0C  0.9%    0.1%  0.1%  0.1%  0.1%  0.1%  0.1%  0.1%  0.1%  0.1%  0.1%
    25.0C  0.6%    0.1%  0.1%  0.1%  0.1%  0.0%  0.1%  0.1%  0.0%  0.1%  0.0%
    26.0C  0.3%    0.0%  0.0%  0.0%  0.0%  0.0%  0.0%  0.0%  0.0%  0.0%  0.0%
    27.0C  0.2%    0.0%  0.0%  0.0%  0.0%  0.0%  0.0%  0.0%  0.0%  0.0%  0.0%

Note also that temperature fluctuation depends a lot on weather file that was used for given episode (San Francisco gives more stable results).

IBM / rl-testbed-for-energyplus

Zone air temperature fluctuates a lot after 200 episodes #109