Add additional step of 'reserving' IPython kernel before starting a queue, execution of a plan or a task. To 'reserve' the kernel, the manager is sending a request to the worker, and the worker attempts to start execution loop in IPython kernel. If the kernel is busy or the loop fails to start within short period (timeout is 0.2 s), the reserve_kernel request fails, which causes the respective operation (e.g. starting the queue) to fail. 'Reservation' must complete before the manager returns operation result to the client, i. e. if the API request to start the queue (queue_start) returns success=True, it is guaranteed that the IPython kernel is already running the execution loop and ready to execute plans.
New API manager_test was added. The API will be used exclusively for unit tests. No CLI access or python API support for this API will be implemented in the future.
Motivation and Context
The worker state information held by the manager process is delayed and the manager may consider IPython kernel idle, while it could already be busy running some code started by a directly connected client (e.g. Jupyter Console). In some rare cases, API requests that start execution of the queue, a plan or a task could return success=True to the client, but then fail to start execution loop in the kernel. This would cause the respective plan or a task to fail and the queue to be stopped. The changes in this PR are intended to fix this issue and make the behavior more consistent.
Add additional step of 'reserving' IPython kernel before starting a queue, execution of a plan or a task. To 'reserve' the kernel, the manager is sending a request to the worker, and the worker attempts to start execution loop in IPython kernel. If the kernel is busy or the loop fails to start within short period (timeout is 0.2 s), the
reserve_kernel
request fails, which causes the respective operation (e.g. starting the queue) to fail. 'Reservation' must complete before the manager returns operation result to the client, i. e. if the API request to start the queue (queue_start
) returnssuccess=True
, it is guaranteed that the IPython kernel is already running the execution loop and ready to execute plans.New API
manager_test
was added. The API will be used exclusively for unit tests. No CLI access or python API support for this API will be implemented in the future.Motivation and Context
The worker state information held by the manager process is delayed and the manager may consider IPython kernel idle, while it could already be busy running some code started by a directly connected client (e.g. Jupyter Console). In some rare cases, API requests that start execution of the queue, a plan or a task could return
success=True
to the client, but then fail to start execution loop in the kernel. This would cause the respective plan or a task to fail and the queue to be stopped. The changes in this PR are intended to fix this issue and make the behavior more consistent.Summary of Changes for Release Notes
Fixed
Added
Changed
Removed
How Has This Been Tested?
Unit tests were implemented