Skipping action sending may lead to inconsistencies

Following code may lead to race conditions in synchronization between observations collection and action sending

https://github.com/airboxlab/rllib-energyplus/blob/ce3be28e1760b670963ec042c585580f8c85200f/rllibenergyplus/run.py#L225-L242

Returning when action queue is empty is wrong if env or policy action sampling is slower than an E+ timestep execution: the env may not have the time to push an action in the queue before E+ executes the callback. This may be fine if the current E+ timestep is a system (internal) timestep, but not if it's a global timestep: in this case we should wait for the action to be available and apply it to the actuator.

airboxlab / rllib-energyplus

Skipping action sending may lead to inconsistencies #18