STARIONGROUP / COMET-WebServices-Community-Edition

The Concurrent Design Platform Web Services that are compliant with ECSS-E-TM-10-25 Annex A and Annex C
https://www.stariongroup.eu
Other
11 stars 5 forks source link

Improve experience with long running processes wrt clients and predefined timeouts #268

Closed alexatstariongroup closed 6 months ago

alexatstariongroup commented 1 year ago

Prerequisites

Description

The current behavior with long tasks is suboptimal. If the task takes more than 60 seconds to process on the server, the clients will often time out with message: "A task was cancelled" with no further information to the user on the state. Local cache and user interface therefore become desynced from the server until an automatic or manual refresh is done (granted that the task finished meanwhile). This can lead to further downstream errors.

Common examples where this can happen:

Solution proposal:

Steps to Reproduce

System Configuration

nlmave commented 1 year ago

can you please add the CDF label for traceability; thanks.

alexatstariongroup commented 1 year ago

Done.

samatstariongroup commented 7 months ago

A proposed solution:

Long Running Tasks

The COMET REST API accepts POST request. Depending on the amount of work the server needs to perform, these may be long running tasks. Examples are:

In order to make sure that the none of the server components cause a timeout exception (i.e. a reverse proxy), every POST operation will translate into a Task. Once the POST request takes longer that a configurable amount of time to return (default is 5 seconds), the server will return a Task object. The Task object will contain the information required to poll the server for the status of the Task. The completion of a Task can also trigger a webhook, notification via websockets or other action.

Task route / endpoint

The COMET REST API exposes the following route where the state of the Tasks can be monitored. Only the user (Person) that initiated the task can read the specific Task. Tasks will be removed from the list of running tasks after a configurable amount of time (default is 3600 seconds).

Route HTTP Method BODY Description
/Tasks GET empty returns all the tasks for the authenticated user making the request, if no tasks are available an empty array is returned
/Task/{iid} GET empty returns the specified task, only accessible by the authenticated user that created the Task. In case the Task does not exist or the Person is not authorized the server respons with a 403-Forbidden
/Task/{iid} POST empty returns the specified cancelled task, only accessible by the authenticated user that created the Task. In case the Task does not exist or the Person is not authorized the server respons with a 403-Forbidden

Task Object

The returned Task object has the following form:

{
    "id": "6a307dc7-ad9b-4684-ab2b-798904dc880c",
    "person": "d504d72b-4a98-46ae-9f47-f2e77a046438",
    "status": "enqueued",
    "duration": 10000, 
    "enqueuedAt": "2022-08-04T12:28:15.159167Z",
    "startedAt": "2022-08-04T12:28:15.161996Z",
    "finishedAt": "2022-08-04T12:28:15.163188Z",
    "post": { _add: "",
             _update: "",
             _delete: "",
             _copy: ""
             },
    "things": [],
    "error": {
        "message": "The EngineeringModel could not be created"
    }
}

The status enum has the following possible values:

Server Implementation

The CDP4-COMET server will keep the running tasks in memomory for a configurable amount of time. The default is 3600 seconds, the maximim allowed is 86400 (24 hours). The cache will not survice a reboot of the server. The tasks will be cached using the IMemoryCache interface of aspnet. Each Task is cached using the Task unique identifier and the Task object. The cache identifier has the following form: Task-{GUID}

The cached tasks are exposed using an injected TaskService which provides a wrapper around the IMemoryCache.