Open yadudoc opened 3 years ago
Scenario | Task Status |
---|---|
Queried a completed task | {'pending': False, 'status': 'success', 'result': 'Hello World!', 'completion_t': '1623881816.333504'} |
Queried a task in process | {'pending': True, 'status': 'running'} |
Queried a task that has been submitted to an offline endpoint | {'pending': True, 'status': 'waiting-for-ep'} |
Queried a task with exception (e.g., divided by 0) | {'pending': False, 'status': 'failed', 'exception': <parsl.app.errors.RemoteExceptionWrapper object at 0x7f7d28c0ea50>, 'completion_t': '1623882199.3872406'} |
Queried a task with exception (e.g., non-supporting import) | {'pending': False, 'status': 'failed', 'exception': <parsl.app.errors.RemoteExceptionWrapper object at 0x7f619936ead0>, 'completion_t': '1623882314.4383392'} |
Note:
fxc.get_task(res)['exception']
.'running'
) and an offline endpoint ('waiting-for-ep'
). As long as web-service/forwarder does not receive endpoint's heartbeat or function result, it shows 'waiting-for-ep'
. So a task submitted to an offline endpoint would have 'waiting-for-ep'
status. Then, if the function result is received, the status turns to 'success'
, or if a heartbeat is received, the status turns to 'running'
.Task state diagram https://miro.com/app/board/o9J_l-58NQg=/
There are two kinds of states:
False
.
completed
, we need the function result and completion timefailed
, we need the failure exception and completion timeTrue
.
submitted
waiting-for-ep
dispatched-to-ep
running
Items to discuss:
submitted
and waiting-for-ep
two different states. As after receiving function submission, web-service/forward will immediately connect with the given endpoint. The connection would wind up with two possible outcomes, connection failure (due to any internet issue, we treat them all as endpoint offline) and dispatch work to the endpoint.Endpoint reports endpoint status and task status by executor.
WAITING_FOR_NODES
, WAITING_FOR_LAUNCH
, RUNNING
, SUCCESS
, and FAILED
.
From forwarder's point of view, task statuses include RECEIVED
, WAITING_FOR_EP
, WAITING_FOR_NODES
, WAITING_FOR_LAUNCH
, RUNNING
, SUCCESS
, and FAILED
Task status definitions:
RECEIVED
: Task is in this state when the web-service has received the task submission.
DISPATCHED_TO_EP
: Task is in this state when the forwarder has dispatched the task to the endpoint, but it has not been acknowledged.
WAITING_FOR_NODES
: Task is in this state when endpoint is waiting for sending the task to be sent to funcx-manager
.
WAITING_FOR_LAUNCH
: Task is in this state when endpoint is waiting for the task to be executed by funcx-worker
.
RUNNING
: Task is in this state when it is being executed by funcx-worker
.
SUCCESS
: Task is in this state when its execution is successfully completed with result returned.
FAILED
: Task is in this state when its execution failed with exception returned.
Is your feature request related to a problem? Please describe.
This is more of a technical problem that bothers us as developers than users, but having a well-defined task life-cycle would help explain to users what is going on with their functions.
Describe the solution you'd like Create a state-flow diagram that captures the states the function goes through on the web-service, forwarder, and endpoint. As a bonus, it would be good to extend this diagram to support retries.
Additional context This is necessary for #509