Closed antoine-galataud closed 2 years ago
@takaomoriyama please let me know your thoughts on this. I have tested the solution on one of my environments, and it does fix the reported issue. Performance (episode reward mean) is marginally improved, but convergence happens faster (tested on 4 trials).
I added an example of EnergyPlus model change to fix this problem. See #65
@antoine-galataud @takaomoriyama
Hi Antoine,
I encountered a similar problem when I check the trajectory of ITE and HVAC powers.
Firstly, I recorded the observations of the testbed by running the following pseudo-code (calling managers are all AfterPredictorAfterHVACManagers ).
Reset the environment and get observation o for i in range(n), run: Generate action a, according to o Execute a and obtain new observation o2 Store (o,a) _o←o2
The results are shown below
I found that the power terms in the observation do not follow the CPU utilization rate, and there is always a “1 step latency” (row 10 and row 18) every time the utilization changes.
For row 10, the CPU utilization is 0.75. According to working process of E+, the zone load predictor will predict “ITEPower” according to the scheduled utilization 0.75. The predicted value should be around 102731, similar to row 10 to row 17. Then E+ adjust the HVAC manager accordingly. However, the observed value of the testbed for row 10 is 76788, which is similar to the value when the utilization is 0.5. Then Actions (E10, F10) are generated according to the observed value (B10, C10, D10). Thus, the generated supply flow rate of external controller is around 3.8, similar to the value when the utilization rate is 0.5.
Then, the problem arises. Because the predicted ITEPower of E+ is around 102731 while the supply flow rate is generated according to 76788, the zone temperatures will be higher than the desired value (H10 and I10. The target is 25℃). Therefore, I guess, for the object “@ExtCtrlObs,” the ITEPower it observes is the result of the last time step.
I also tested the solution you proposed here. However, I got "0" for all power terms.
To find the reason, I read the EMS manual. At time step t_1, the program “ExtCtrlBasedSetpointManager” will be called after the calling manager “AfterPredictorAfterHVACManager.” However, at this point, the sensor value of “Facility Total Building Electric Demand Power” has not been updated according to the CPU utilization rate U_schedule^1. Thus, the observed value of the testbed is still P_ITE^0 which is calculated according to U_schedule^0. This is no longer in line with the control logic of E+: _predict the load based on the utilization at time step t1 and then adjust the HVAC actuators.
The results show that the ITE power will follow the schedule of utilization, as indicated in K10 of the excel sheet.
At the same time, the Markove Decision Process will be more clearer. For example, all terms in s(t) is the result of excute a(t-1) at s(t-1).
Hope I made it clear. I'm looking forward to your opinion.
@ZHANG-QINGANG Hi! Thank you for the detailed issue report.
At first sight, this looks like a slightly different problem than the original one: in the original issue there's a gap of one EnergyPlus time step between action decided by agent and observations collected (observations collected and reward computed on t-2 instead of t-1). In your report, the shift seems to come from within the same observation, but actions and observations seem well aligned, assuming that your excel spreadsheet:
AfterPredictorAfterHVACManagers
calling point onlySee below for illustration:
Then I'd say that your solution should come as a complement.
The question I have is about the tests you did with the proposed solution for the original issue, which consists in using EndOfZoneTimestepAfterZoneReporting
for observations collection, and the fact that it results in 0W of power sent by EnergyPlus. Collecting observations near the end of the timestep should positively impact zones-related calculations, like indoor temperatures and meters. This is what I observe in my projects (they use HVAC power mainly), this is also stated in E+ documentation:
EndOfZoneTimestepAfterZoneReporting. This calling point happens each zone timestep just after the output reports are updated for zone-timestep variables and meters. This calling point is the last one of a timestep and is useful for making control decisions for the next zone timestep using the final meter values for the current zone timestep.
Could you please share your IDF file, or at least the EMS part?
Hi @antoine-galataud,
I attached my modified IDF file. It follows the structure of the 2 zone model provided by this testbed. It should be noted that I generated the excel sheet by using the following pseudo-code, not the reported CSV file of E+. The following figure is the code I used to record data. Then, in the same row, the temperature terms are from "o2". I think it is reasonable since the temperature of "o2" is the result of executing "a" at "o". What do you think?
I tested your proposed solution by directly running the IDF model you modified. I got "0" when I try to record the observations. I feel the "0" is a little weird. But I have not found the reason so far.
For my IDF file, I added several sensors to collect the power of Equipment. I also wrote a program named "CalculatePredictor" to calculate the cooling load.
Reset the environment and get observation o for i in range(n), run: Generate action a, according to o Execute a and obtain new observation o2 Store (o,a,o2) _o←o2
Thank you @ZHANG-QINGANG for sharing your IDF file. This allowed to find a problem in the original fix I provided. The statement:
SET tmp_val1 = @ExtCtrlAct 0 7
should be in the ExtCtrlBasedSetpointManager
program, and not in ExtCtrlBasedObservationCollector
.
Find attached your file with a fix. Let me know if that works for you:
Thanks
2ZoneDataCenterHVAC_wEconomizer_Temp_Fan_ZhangQingang.idf.zip
Hi @antoine-galataud ,
I tested you new solution. The "0" reading issue was solved and the CPU-ITEPower was aligned.
However, it seems the zone temperature still cannot been well controlled, as illustrated in the following figure.
I guess, even though the observation was correct, the corresponding actions will be executed in the next time step.
What do you think?
@ZHANG-QINGANG I would say now that the alignment problem is solved, but the policy learned is sub-optimal as it is unable to anticipate on CPU utilization increase and/or temperature increase in zones. But this is a somewhat different problem.
Now the alignment seems correct and we observe in your last spreadsheet, for instance:
It's a partially observable MDP, nothing in the collected state at row 8 can lead to a transition towards a greater decided action.
@antoine-galataud
Yes, I agree with your opinion: it is unable to anticipate on CPU utilization increase and/or temperature increase in zones. Your solution is good.
My only concern is if the working process follows the figure in my last message, It might not follow the control logic of E+: Predict Equipment Load~Adjust HVAC Manager accordingly. Then, the control will be quite passive. Problems may arise when the utilization changes frequently and drastically. Here I attached a case where utilization changes from 0.2 to 0.8. Of course, this can remain as future research.
It is a pleasure to discuss with you, very helpful. :-)
@ZHANG-QINGANG thank you, glad I could help. A pleasure for me too!
@takaomoriyama the problem is now fixed with the new fix I pushed in #77. I suggest we re-close this issue, unless you think otherwise.
@antoine-galataud OK. That's great. If we need some more discussion, let's open another issue.
@ZHANG-QINGANG Thank you for your contribution.
Found this problem while on a model where HVAC system shuts off at night and with
CurrentTime
EnergyPlus built-in variable used as observation.Problem description
On this model, I have HVAC starting at 6:00 AM. Current time is reported as an observation as fractional time. On a given episode, I get this following wrong trajectory reported
What happens:
The trajectory we feed to the agent is the following:
This is a shift of 1 timestep for the observations sent to the agent and for computing the reward.
This can be confirmed by asking EnergyPlus to output results in CSV or SQL:
Here we can see that there is no such shift (indoor temperatures have increased already at 06:15).
The correct trajectory should be:
Reason
I believe the incorrect calling point for EMS program is used. The one currently used is
AfterPredictorAfterHVACManagers
. As stated in https://bigladdersoftware.com/epx/docs/8-2/input-output-reference/group-energy-management-system-ems.html#energymanagementsystemprogramcallingmanager and https://bigladdersoftware.com/epx/docs/8-2/ems-application-guide/ems-calling-points.html#ems-calling-points, this calling point happens before HVAC calculations, so variables and meters values are still the one from previous time step.Solution
Apply action(s) at
AfterPredictorAfterHVACManagers
calling point, but collect observations after zone time step is done, e.g. atEndOfZoneTimestepAfterZoneReporting
calling point