Open WynnCJF opened 2 years ago
Here's the statistical information of episode 215:
Reward ave=-0.48, min=-3.77, max= 1.36, std= 0.96
westzone_temp ave=22.02, min=15.26, max=35.89, std= 2.65
eastzone_temp ave=22.15, min=16.41, max=34.43, std= 2.35
Power consumption ave=102,383.14, min=40,710.74, max=175,918.97, std=27,053.27
pue ave= 1.31, min= 1.02, max= 2.06, std= 0.16
westzone_temp distribution
degree 0.0-0.9 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
-------------------------------------------------------------------------
17.0C 1.6% 0.2% 0.2% 0.2% 0.1% 0.1% 0.1% 0.1% 0.1% 0.2% 0.2%
18.0C 3.4% 0.2% 0.2% 0.3% 0.3% 0.3% 0.4% 0.4% 0.4% 0.4% 0.5%
19.0C 5.4% 0.5% 0.4% 0.5% 0.5% 0.4% 0.5% 0.6% 0.6% 0.7% 0.8%
20.0C 15.3% 1.0% 1.1% 1.3% 1.5% 1.6% 1.7% 1.8% 1.7% 1.7% 1.8%
21.0C 12.7% 1.7% 1.6% 1.5% 1.5% 1.3% 1.1% 1.0% 1.0% 1.0% 1.0%
22.0C 12.5% 0.9% 0.9% 1.0% 1.0% 1.0% 1.2% 1.3% 1.6% 1.7% 1.9%
23.0C 22.5% 1.9% 2.1% 2.4% 2.4% 2.4% 2.3% 2.3% 2.4% 2.2% 2.0%
24.0C 13.2% 2.1% 1.9% 1.7% 1.6% 1.4% 1.1% 1.0% 1.0% 0.9% 0.7%
25.0C 3.1% 0.5% 0.5% 0.4% 0.4% 0.3% 0.3% 0.2% 0.2% 0.2% 0.1%
26.0C 0.9% 0.1% 0.1% 0.1% 0.1% 0.1% 0.1% 0.0% 0.1% 0.1% 0.1%
27.0C 0.6% 0.1% 0.1% 0.1% 0.1% 0.0% 0.1% 0.0% 0.1% 0.1% 0.1%
Hi @WynnCJF, do you get the same during training or is it only during inference? Training and inference should show similar results if inference was performed on a reloaded policy, assuming that reload works as expected.
Also note: depending on how you perform inference, you may want to make predictions deterministic and no longer stochastic (keeping the mode of the outputted action probability distribution - ie for present continuous case with gaussian, that means keeping mean and ditching standard deviation). Not sure how baselines handles that though.
Hi @antoine-galataud. I saved the model checkpoints and reloaded them using tf.train.Checkpoint, and I compared the eplusout.csv generated by EnergyPlus during training and inference. Yes the results are quite similar. The statistical information in your README file shows the result of episode 362, which is considerably larger than the 200 episodes I ran. Do you think my issue results from insufficient training? Thanks for the note on deterministic predictions! We'll look into that.
Do you see training stats improving even at episode 200? If yes then training longer could help.
You can check if episode reward mean is still improving, as well as temperature and power demand. Something that would be worth tracking too (but not accessible easily with baselines) are policy and value function losses, as well as entropy (near 0 if policy becomes deterministic).
Hi @antoine-galataud. Thanks for your help! However, the training results at episode 362 are still unstable even if I use the original docker image without changing any code:
Reward ave=-0.44, min=-3.95, max= 1.42, std= 0.96
westzone_temp ave=21.96, min=15.29, max=36.78, std= 2.56
eastzone_temp ave=22.15, min=16.45, max=33.80, std= 2.26
Power consumption ave=102,154.61, min=40,686.45, max=178,973.00, std=26,960.15
pue ave= 1.31, min= 1.02, max= 2.05, std= 0.15
westzone_temp distribution
degree 0.0-0.9 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
-------------------------------------------------------------------------
17.0C 1.5% 0.2% 0.2% 0.1% 0.1% 0.1% 0.1% 0.1% 0.1% 0.2% 0.2%
18.0C 3.1% 0.2% 0.2% 0.2% 0.3% 0.3% 0.3% 0.3% 0.4% 0.4% 0.5%
19.0C 5.2% 0.5% 0.5% 0.5% 0.5% 0.4% 0.5% 0.5% 0.5% 0.6% 0.7%
20.0C 16.6% 0.9% 1.0% 1.3% 1.5% 1.7% 1.9% 2.1% 2.1% 2.1% 2.0%
21.0C 12.3% 1.9% 1.8% 1.6% 1.3% 1.2% 1.1% 0.9% 0.9% 0.7% 0.8%
22.0C 11.4% 0.8% 0.8% 0.8% 0.9% 1.0% 1.1% 1.2% 1.5% 1.5% 1.8%
23.0C 24.5% 2.0% 2.2% 2.4% 2.3% 2.6% 2.7% 2.6% 2.7% 2.6% 2.4%
24.0C 13.4% 2.4% 2.0% 1.9% 1.6% 1.5% 1.1% 0.9% 0.8% 0.6% 0.5%
25.0C 2.1% 0.4% 0.3% 0.3% 0.2% 0.2% 0.2% 0.2% 0.1% 0.1% 0.1%
26.0C 0.7% 0.1% 0.1% 0.1% 0.1% 0.1% 0.1% 0.0% 0.0% 0.0% 0.1%
27.0C 0.5% 0.1% 0.0% 0.1% 0.0% 0.0% 0.0% 0.1% 0.0% 0.0% 0.1%
Not sure why I cannot reproduce the results in README. 😕
@xingjian-zhang I've checked the orginal paper and these indoor temperature fluctuations are expected when training only on USA_CA_San.Francisco.Intl.AP.724940_TMY3.epw
weather file. See Simulation Results (Section 5.2), Figure 4.
If you want to get a better chance at reproducing paper's results, you can edit ~/.bashrc_eplus
and uncomment the relevant export ENERGYPLUS_WEATHER
line (last one of the group). Then don't forget to source ~/.bashrc_eplus
before launching a new training.
Thanks for the suggestion! I will check on that. 🚀
Hi @antoine-galataud! We followed your advice and changed ENERGYPLUS_WEATHER
to more weather files:
root@0366a70a6b20:~/rl-testbed-for-energyplus# echo $ENERGYPLUS_WEATHER
/usr/local/EnergyPlus-22-1-0/WeatherData/USA_CA_San.Francisco.Intl.AP.724940_TMY3.epw,/usr/local/EnergyPlus-22-1-0/WeatherData/USA_CO_Golden-NREL.724666_TMY3.epw,/usr/local/EnergyPlus-22-1-0/WeatherData/USA_FL_Tampa.Intl.AP.722110_TMY3.epw
We left the other code unchanged in the original docker image and trained for around 500 episodes. Yet the room temperature still seems to fluctuate:
energyplus_model.plot csv=/root/eplog/openai-2022-07-30-04-36-07-288085/output/episode-00000500-08436/eplusout.csv.gz
read_episode: file=/root/eplog/openai-2022-07-30-04-36-07-288085/output/episode-00000500-08436/eplusout.csv.gz
episode /root/eplog/openai-2022-07-30-04-36-07-288085/output/episode-00000500-08436/eplusout.csv.gz
read_episode: file=/root/eplog/openai-2022-07-30-04-36-07-288085/output/episode-00000500-08436/eplusout.csv.gz
Reward ave=-0.76, min=-2.25, max= 1.03, std= 0.89
westzone_temp ave=21.09, min=15.51, max=28.85, std= 2.78
eastzone_temp ave=21.79, min=16.87, max=30.25, std= 2.46
Power consumption ave=109,730.73, min=42,398.08, max=147,799.32, std=28,855.72
pue ave= 1.50, min= 1.03, max= 1.96, std= 0.16
westzone_temp distribution
degree 0.0-0.9 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
-------------------------------------------------------------------------
17.0C 10.4% 2.2% 2.2% 1.7% 1.3% 0.9% 0.6% 0.5% 0.4% 0.4% 0.3%
18.0C 2.9% 0.4% 0.3% 0.2% 0.2% 0.2% 0.2% 0.2% 0.4% 0.3% 0.4%
19.0C 4.8% 0.4% 0.3% 0.4% 0.4% 0.5% 0.4% 0.6% 0.6% 0.6% 0.7%
20.0C 9.9% 0.6% 0.6% 0.7% 0.8% 0.9% 1.1% 1.2% 1.3% 1.3% 1.5%
21.0C 16.4% 1.8% 2.0% 2.3% 2.5% 2.4% 2.1% 1.3% 1.0% 0.6% 0.4%
22.0C 5.0% 0.4% 0.4% 0.4% 0.4% 0.4% 0.5% 0.5% 0.6% 0.7% 0.8%
23.0C 20.2% 1.0% 1.4% 1.8% 2.0% 2.1% 2.1% 2.1% 2.3% 2.6% 2.8%
24.0C 17.2% 2.7% 2.9% 3.0% 2.9% 2.5% 2.0% 1.0% 0.3% 0.1% 0.0%
25.0C 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0%
26.0C 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0%
27.0C 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0%
Moreover, we observed that the setpoint temperature in the action during inference is always at its minimum (10 Celsius degree), and the air mass flow rate is always very close to about 3.5. This means the actual action values are almost fixed.
Is there any way to fix this problem? By the way I noticed that the EnergyPlus version in ~/.bashrc
shown in README is 8-8-0, while the version in docker image is 22-1-0. Does it have anything to do with our problem? Thanks!
That requires more debugging indeed. EnergyPlus version could be one factor, not very likely though. Another thing that changed is openai baselines version: the current master branch uses one that is compatible with tensorflow 2. The original version used tf1. You could try to build an env with original dependencies versions to test, that should be too hard. You can also try to tweak the reward to put more weight on temperature constraint, but that would deviate from original paper.
@WynnCJF @xingjian-zhang I managed to get stable zone air temperature in west zone using model 2ZoneDataCenterHVAC_wEconomizer_Temp.idf
instead of 2ZoneDataCenterHVAC_wEconomizer_Temp_Fan.idf
. This can be set in bashrc_eplus
too (last lines). Note that original paper doesn't indicate which model was used to obtain presented results.
Here is what I obtain on first episode:
>>> df = pd.read_csv("/root/eplog/openai-2022-08-01-19-23-21-347401/output/episode-00000000-00481/eplusout.csv.gz")
>>> df["WEST ZONE:Zone Air Temperature [C](TimeStep)"].describe()
count 35040.000000
mean 36.016728
std 8.386905
min 21.128363
25% 29.051563
50% 35.561136
75% 42.257246
max 58.404368
Name: WEST ZONE:Zone Air Temperature [C](TimeStep), dtype: float64
Then on episode 20:
>>> df = pd.read_csv("/root/eplog/openai-2022-08-01-19-23-21-347401/output/episode-00000020-00481/eplusout.csv.gz")
>>> df["WEST ZONE:Zone Air Temperature [C](TimeStep)"].describe()
count 35040.000000
mean 22.677554
std 1.066780
min 18.919097
25% 22.978730
50% 22.999801
75% 23.000150
max 36.485794
Using python3 -m common.plot_energyplus
:
read_episode: file=/root/eplog/openai-2022-08-01-19-23-21-347401/output/episode-00000021-00481/eplusout.csv.gz
Reward ave=-0.19, min=-2.56, max= 1.49, std= 0.57
westzone_temp ave=22.47, min=18.90, max=35.12, std= 1.21
eastzone_temp ave=22.29, min=19.48, max=39.18, std= 1.55
Power consumption ave=103,788.51, min=46,255.23, max=170,448.48, std=27,558.03
pue ave= 1.30, min= 1.02, max= 1.77, std= 0.14
westzone_temp distribution
degree 0.0-0.9 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
-------------------------------------------------------------------------
17.0C 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0%
18.0C 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0%
19.0C 6.1% 0.1% 0.3% 0.4% 0.5% 0.6% 0.7% 0.7% 0.9% 1.0% 1.0%
20.0C 11.9% 1.0% 1.2% 1.3% 1.4% 1.5% 1.3% 1.2% 1.1% 1.0% 0.8%
21.0C 4.1% 0.7% 0.6% 0.5% 0.5% 0.4% 0.3% 0.2% 0.3% 0.2% 0.2%
22.0C 40.3% 0.3% 0.3% 0.3% 0.3% 0.3% 0.3% 0.4% 0.6% 0.8% 36.8%
23.0C 35.6% 32.2% 0.9% 0.6% 0.5% 0.3% 0.3% 0.2% 0.2% 0.2% 0.2%
24.0C 0.9% 0.1% 0.1% 0.1% 0.1% 0.1% 0.1% 0.1% 0.1% 0.1% 0.1%
25.0C 0.6% 0.1% 0.1% 0.1% 0.1% 0.0% 0.1% 0.1% 0.0% 0.1% 0.0%
26.0C 0.3% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0%
27.0C 0.2% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0%
Note also that temperature fluctuation depends a lot on weather file that was used for given episode (San Francisco gives more stable results).
Hello! I used the original code in baseline docker to train the RL model with command:
python3 -m baselines_energyplus.trpo_mpi.run_energyplus --num-timesteps 10000000
. The model was trained for 200 episodes and saved and applied for inference with the same idf file and weather file. The west zone temperature during inference fluctuates a lot (annual mean temperature is about 22.4 degree), as shown in the graph below:Also, the set point temperature set by action is almost always at the lowest value possible:
Is this the expected behavior for the episode?