Scalability optimizations in the RP stack present race conditions that make it hard to determine whether a submitted Task will ever change state, whether a callback will ever be called, or whether a component has actually started shutting down in the time between successfully enqueuing a command and the responsible thread processing the command.
We can add some facilities to scalems.radical.session.RuntimeSession to consolidate checks with a minimal number of call-backs and extra tasks.
RP callbacks can set threading.Event attributes directly, and/or loop.call_soon_threadsafe(event.set) for asyncio.Event attributes.
Proposed Event attributes
session_closing
session_closed
pilot_available
pilot_done
The RuntimeSession can register some Pilot callbacks and own some asyncio Tasks to maintain the state.
Periodically (async.sleep at least 1 second) check Session.closed, in case the Session is ended by something external, and set session_closed and pilot_done. Cancel this Task when closing normally.
Wait for session_closed and check that session_closing and pilot_done get set.
Use a Pilot callback to set pilot_available. Run an asyncio.Task to unregister the callback when pilot_available or pilot_done get set. Cancel the task when session_closed.
Use a Pilot callback to set pilot_done when Pilot completes, fails, or is canceled.
Create a asyncio.Task to wait for the first of session_closing, session_closed, pilot_available, or pilot_done, or asyncio.sleep(10). If the sleep finished first, check the Pilot state, in case our callback gets registered too late to catch the state transition of interest, and set pilot_available or pilot_done if appropriate. Otherwise, assume the callbacks are good to go, and return.
We may also want to update the handling of the pilot resources Future. The Task responsible should be canceled if not resolved before pilot_done.
We can separate the pilot() acquisition method once these events are available. RuntimeSession will just have a pilot attribute that is None until the Pilot is successfully submitted (if at all). Clients will have to check for non-null value, since pilot_done needs to be set in case of failure.
Note that this issue will require careful testing. See also #359
Scalability optimizations in the RP stack present race conditions that make it hard to determine whether a submitted Task will ever change state, whether a callback will ever be called, or whether a component has actually started shutting down in the time between successfully enqueuing a command and the responsible thread processing the command.
We can add some facilities to scalems.radical.session.RuntimeSession to consolidate checks with a minimal number of call-backs and extra tasks.
RP callbacks can set
threading.Event
attributes directly, and/orloop.call_soon_threadsafe(event.set)
for asyncio.Event attributes.Proposed Event attributes
The RuntimeSession can register some Pilot callbacks and own some asyncio Tasks to maintain the state.
async.sleep
at least 1 second) checkSession.closed
, in case the Session is ended by something external, and set session_closed and pilot_done. Cancel this Task when closing normally.We may also want to update the handling of the pilot
resources
Future. The Task responsible should be canceled if not resolved before pilot_done.We can separate the pilot() acquisition method once these events are available. RuntimeSession will just have a
pilot
attribute that is None until the Pilot is successfully submitted (if at all). Clients will have to check for non-null value, since pilot_done needs to be set in case of failure.Note that this issue will require careful testing. See also #359