SCALE-MS / scale-ms

SCALE-MS design and development
GNU Lesser General Public License v2.1
4 stars 4 forks source link

Unify tracking of RP session states. #383

Open eirrgang opened 1 year ago

eirrgang commented 1 year ago

Scalability optimizations in the RP stack present race conditions that make it hard to determine whether a submitted Task will ever change state, whether a callback will ever be called, or whether a component has actually started shutting down in the time between successfully enqueuing a command and the responsible thread processing the command.

We can add some facilities to scalems.radical.session.RuntimeSession to consolidate checks with a minimal number of call-backs and extra tasks.

RP callbacks can set threading.Event attributes directly, and/or loop.call_soon_threadsafe(event.set) for asyncio.Event attributes.

Proposed Event attributes

The RuntimeSession can register some Pilot callbacks and own some asyncio Tasks to maintain the state.

We may also want to update the handling of the pilot resources Future. The Task responsible should be canceled if not resolved before pilot_done.

We can separate the pilot() acquisition method once these events are available. RuntimeSession will just have a pilot attribute that is None until the Pilot is successfully submitted (if at all). Clients will have to check for non-null value, since pilot_done needs to be set in case of failure.

Note that this issue will require careful testing. See also #359