Returning when action queue is empty is wrong if env or policy action sampling is slower than an E+ timestep execution: the env may not have the time to push an action in the queue before E+ executes the callback.
This may be fine if the current E+ timestep is a system (internal) timestep, but not if it's a global timestep: in this case we should wait for the action to be available and apply it to the actuator.
Following code may lead to race conditions in synchronization between observations collection and action sending
https://github.com/airboxlab/rllib-energyplus/blob/ce3be28e1760b670963ec042c585580f8c85200f/rllibenergyplus/run.py#L225-L242
Returning when action queue is empty is wrong if env or policy action sampling is slower than an E+ timestep execution: the env may not have the time to push an action in the queue before E+ executes the callback. This may be fine if the current E+ timestep is a system (internal) timestep, but not if it's a global timestep: in this case we should wait for the action to be available and apply it to the actuator.