NREL / EnergyPlus

EnergyPlus™ is a whole building energy simulation program that engineers, architects, and researchers use to model both energy consumption and water use in buildings.
https://energyplus.net
Other
1.15k stars 392 forks source link

[pyenergyplus] notifications for warmup and simulation ends #10068

Closed antoine-galataud closed 1 year ago

antoine-galataud commented 1 year ago

Issue overview

In order to integrate EnergyPlus Python API in other programs, it's sometimes necessary to asynchronously consume variables and send actuators values. One example of such use case can be found here: https://github.com/airboxlab/rllib-energyplus/blob/main/run.py, which is essentially running EnergyPlus simulations in OpenAI Gym environment as part of a reinforcement learning experiment, backed by a distributed training framework (Ray). To integrate with the python API paradigm mainly based on callbacks, queues are used as exchange mechanisms.

Some control flow mechanisms that are used and associated limitations:

A potential enhancement would be to introduce additional callbacks to notify about warmup and simulation phases status, most importantly when they are finished. With that, it's possible to introduce additional mechanisms to wait on warmup and clean stop when simulation is finished, and to reduce timeout on variable values reception to a minimum corresponding to the estimated duration of a timestep. This still requires configuration but it's more predictible and deterministic than warmup period duration.

Details

Some additional details for this issue (if relevant):

Checklist

Add to this list or remove from it as applicable. This is a simple templated set of guidelines.

jmarrec commented 1 year ago

There is callback_after_new_environment_warmup_complete in the runtime API.

This function allows a client to register a function to be called back by EnergyPlus at the warmup of each environment.

I'm curious about callback_progress may or may not be sending 100. Is that true? do you have a reproducer?

jmarrec commented 1 year ago

I can reproduce progress callback not sending 100%

progress = []
def callback_progress(s):
    global progress
    progress.append(s)
api.runtime.callback_progress(state, callback_progress)

I get [0, 3, 6, 9, 12, 15, 18, 21, 24, 27, 30, 33, 36, 39, 42, 45, 48, 52, 55, 58, 61, 64, 67, 70, 73, 76, 79, 82, 85, 88, 91, 94]

antoine-galataud commented 1 year ago

Thanks @jmarrec! I missed the callback_after_new_environment_warmup_complete but it may solve 99% of the story here.

jmarrec commented 1 year ago

run_energyplus should return an int when it's done, so you can capture this. Otherwise, check the messages?

def callback_msg(s):
    s = s.decode()
    if 'EnergyPlus Run Time=' in s:
        print("SIMULATION IS DONE")

api.runtime.callback_message(state, callback_msg)

You can also monitor stderr and intercept "EnergyPlus Completed Sucessfully" there.

I think we should make sure to send the progress callback at 100 when it succeeds though.

Myoldmopar commented 1 year ago

But even if we emit a 100% callback, the user will still need to return control to EnergyPlus so it can do final cleanups and end gracefully. So the 100 is not really "done". The call to run_energyplus will return when EnergyPlus is done and that should be the ultimate way to find out if EnergyPlus is done.

jmarrec commented 1 year ago

I'd say here we should insert https://github.com/NREL/EnergyPlus/blob/b03805c97d6511a3232300e65b0d30dc0d627816/src/EnergyPlus/UtilityRoutines.cc#L777-L779

    if (state.dataGlobal->fProgressPtr) {
        state.dataGlobal->fProgressPtr(100);
    }
    if (state.dataGlobal->progressCallback) {
        state.dataGlobal->progressCallback(100);
    }
Myoldmopar commented 1 year ago

That's acceptable, but again, the user shouldn't then kill the thread just because they receive a 100 value. EnergyPlus should have a chance to clean up gracefully, and only kill the thread itself once run_energyplus has returned.

jmarrec commented 1 year ago

That's acceptable, but again, the user shouldn't then kill the thread just because they receive a 100 value. EnergyPlus should have a chance to clean up gracefully, and only kill the thread itself once run_energyplus has returned.

I get it. I'm just saying as someone who writes a GUI such as OpenStudioApplication, I kinda expect the progress bar to receive 100% at some point.

Myoldmopar commented 1 year ago

Yeah, it's silly that it doesn't. This would be a good addition.

antoine-galataud commented 1 year ago

after a quick test, I validate the use of callback_after_new_environment_warmup_complete. Here is an excerpt:

def _warmup_complete(state: Any) -> None:
    self.warmup_complete = True
    self.warmup_queue.put(True)

# register callback used to signal warmup complete
runtime.callback_after_new_environment_warmup_complete(self.energyplus_state, _warmup_complete)

...

self.obs_queue = Queue(maxsize=1)
self.act_queue = Queue(maxsize=1)

self.energyplus_runner = EnergyPlusRunner(
    episode=self.episode,
    env_config=self.env_config,
    obs_queue=self.obs_queue,
    act_queue=self.act_queue
)
self.energyplus_runner.start()

# wait for E+ warmup to complete
if not self.energyplus_runner.warmup_complete:
    self.energyplus_runner.warmup_queue.get()
    print("-- Warmup complete")

try:
    obs = self.obs_queue.get()
    print("-- Got first observation")
except Empty:
    obs = self.last_obs

Produces something like:

(RolloutWorker pid=8036) Warming up
(RolloutWorker pid=8036) Warming up
(RolloutWorker pid=8036) Warming up
(RolloutWorker pid=8036) Warming up
(RolloutWorker pid=8036) Warming up
(RolloutWorker pid=8036) -- Warmup completePerforming Zone Sizing Simulation
(RolloutWorker pid=8036) 
(RolloutWorker pid=8036) ...for Sizing Period: #1 PARIS_ ORLY ANN CLG .4% CONDNS DP=>MDB
(RolloutWorker pid=8036) Warming up
(RolloutWorker pid=8034) Performing Zone Sizing Simulation
(RolloutWorker pid=8034) ...for Sizing Period: #1 PARIS_ ORLY ANN CLG .4% CONDNS DP=>MDB
(RolloutWorker pid=8034) -- Warmup complete
(RolloutWorker pid=8036) Warming up
(RolloutWorker pid=8036) Warming up
(RolloutWorker pid=8034) Warming up
(RolloutWorker pid=8034) Warming up
(RolloutWorker pid=8034) Warming up
(RolloutWorker pid=8034) Warming up
(RolloutWorker pid=8034) Warming up
(RolloutWorker pid=8034) Performing Zone Sizing Simulation
(RolloutWorker pid=8034) ...for Sizing Period: #2 PARIS_ ORLY ANN CLG .4% CONDNS WB=>MDB
(RolloutWorker pid=8034) Warming up
(RolloutWorker pid=8034) Warming up
(RolloutWorker pid=8034) Warming up
(RolloutWorker pid=8034) Warming up
(RolloutWorker pid=8034) Warming up
(RolloutWorker pid=8034) Warming up
(RolloutWorker pid=8034) Warming up
(RolloutWorker pid=8034) Warming up
(RolloutWorker pid=8034) Warming up
(RolloutWorker pid=8034) Warming up
(RolloutWorker pid=8034) Warming up
(RolloutWorker pid=8034) Performing Zone Sizing Simulation
(RolloutWorker pid=8034) ...for Sizing Period: #3 PARIS_ ORLY ANN HTG 99.6% CONDNS DB
(RolloutWorker pid=8034) Calculating System sizing
(RolloutWorker pid=8034) ...for Sizing Period: #1 PARIS_ ORLY ANN CLG .4% CONDNS DP=>MDB
(RolloutWorker pid=8034) Calculating System sizing
(RolloutWorker pid=8034) ...for Sizing Period: #2 PARIS_ ORLY ANN CLG .4% CONDNS WB=>MDB
(RolloutWorker pid=8034) Calculating System sizing
(RolloutWorker pid=8034) ...for Sizing Period: #3 PARIS_ ORLY ANN HTG 99.6% CONDNS DB
(RolloutWorker pid=8034) Adjusting Air System Sizing
(RolloutWorker pid=8034) Adjusting Standard 62.1 Ventilation Sizing
(RolloutWorker pid=8034) Initializing Simulation
(RolloutWorker pid=8034) Reporting Surfaces
(RolloutWorker pid=8034) Beginning Primary Simulation
(RolloutWorker pid=8034) Initializing New Environment Parameters
(RolloutWorker pid=8034) Warming up {1}
(RolloutWorker pid=8036) Warming up
(RolloutWorker pid=8036) Warming up
(RolloutWorker pid=8036) Performing Zone Sizing Simulation
(RolloutWorker pid=8036) ...for Sizing Period: #2 PARIS_ ORLY ANN CLG .4% CONDNS WB=>MDB
(RolloutWorker pid=8036) Warming up
(RolloutWorker pid=8036) Warming up
(RolloutWorker pid=8036) Warming up
(RolloutWorker pid=8036) Warming up
(RolloutWorker pid=8036) Warming up
(RolloutWorker pid=8036) Warming up
(RolloutWorker pid=8036) Warming up
(RolloutWorker pid=8036) Warming up
(RolloutWorker pid=8036) Warming up
(RolloutWorker pid=8036) Warming up
(RolloutWorker pid=8036) Warming up
(RolloutWorker pid=8036) Performing Zone Sizing Simulation
(RolloutWorker pid=8036) ...for Sizing Period: #3 PARIS_ ORLY ANN HTG 99.6% CONDNS DB
(RolloutWorker pid=8036) Calculating System sizing
(RolloutWorker pid=8036) ...for Sizing Period: #1 PARIS_ ORLY ANN CLG .4% CONDNS DP=>MDB
(RolloutWorker pid=8036) Calculating System sizing
(RolloutWorker pid=8036) ...for Sizing Period: #2 PARIS_ ORLY ANN CLG .4% CONDNS WB=>MDB
(RolloutWorker pid=8036) Calculating System sizing
(RolloutWorker pid=8036) ...for Sizing Period: #3 PARIS_ ORLY ANN HTG 99.6% CONDNS DB
(RolloutWorker pid=8036) Adjusting Air System Sizing
(RolloutWorker pid=8036) Adjusting Standard 62.1 Ventilation Sizing
(RolloutWorker pid=8036) Initializing Simulation
(RolloutWorker pid=8036) Reporting Surfaces
(RolloutWorker pid=8036) Beginning Primary Simulation
(RolloutWorker pid=8036) Initializing New Environment Parameters
(RolloutWorker pid=8036) Warming up {1}
(RolloutWorker pid=8034) Warming up {2}
(RolloutWorker pid=8034) Warming up {3}
(RolloutWorker pid=8036) Warming up {2}
(RolloutWorker pid=8036) Warming up {3}
(RolloutWorker pid=8034) Warming up {4}
(RolloutWorker pid=8034) Warming up {5}
(RolloutWorker pid=8036) Warming up {4}
(RolloutWorker pid=8036) Warming up {5}
(RolloutWorker pid=8034) Warming up {6}
(RolloutWorker pid=8034) Warming up {7}
(RolloutWorker pid=8036) Warming up {6}
(RolloutWorker pid=8036) Warming up {7}
(RolloutWorker pid=8034) Warming up {8}
(RolloutWorker pid=8034) Starting Simulation at 01/01/2020 for 2020
(RolloutWorker pid=8034) -- Got first observation
(RolloutWorker pid=8036) Warming up {8}
(RolloutWorker pid=8036) Starting Simulation at 01/01/2020 for 2020
(RolloutWorker pid=8036) -- Got first observation

Notice that a lot of "Warming up" messages logged after "-- Warmup complete" is received. The message "-- Got first observation" shows that we got a value on queue right after the warmup was complete, so that's a good point, even though nothing prevents a race condition making waiting on queue before the appropriate data collection callback is called.

antoine-galataud commented 1 year ago

For the simulation end case, the solution will be less intuitive since we may be waiting for timestep execution results already when E+ actually returns. I'll figure out a custom solution, this doesn't require any change in EnergyPlus in my opinion. If you're ok with that @jmarrec @Myoldmopar I'll close this issue, unless you want to tackle the progress value with it.

jmarrec commented 1 year ago

I'll fix the progress value to 100 issue. In a minute, just adding tests