Asynchronous calls to jobs

Here's the idea to address the problem of very long requests, ending with a timeout. Pub component could expose the new endpoints for making the asynchronous calls to the jobs. The main idea is to split a long request into 2 parts:

requesting a new task - a user makes one request to start a task and get the immediate response with the ID of the task. The task is then scheduled and processed by a job.
checking on the result - a user checks periodically if a requested task is already done. If so, he gets a response.

Details:

With this approach, no changes will be required on the job's side. Pub will take care of these asynchronous calls. It will make the old-fashioned synchronous call to a job on behalf of a user and it will wait for a result until the task is done.

External consumers (users) will need to change the way of making calls. They will have to use the other endpoints if they want to make an asynchronous call. Example:

response = httpx.post('http://localhost:7005/pub/async/new/job/adder/latest/api/v1/perform', json={
  'numbers': [40, 7],
})
task_id = resonse.json()['task_id']

while True:
  response = httpx.get(f'http://localhost:7005/pub/async/task/{task_id}')
  if response.json()['status'] == 'done':
      break
  time.sleep(5)  # or use backoff package to use smarter interval, like Fibonacci sequence
result = response.json()['result']

This is an opt-in feature, old-fashioned sync calls are left intact
Checking the result may be done simply by using an HTTP endpoint returning status of a task, called multiple times in a loop or on a Websocket channel or using HTTP long polling
Multiple Pub replicas have to be taken into account, so the knowledge about the tasks have to be shared by many instances of Pub. The task could be saved and retrieved in a Postgres database, by means of Lifecyle service. Or Postgres' LISTEN and NOTIFY may come in handy. Or some other protocol for sharing data with Pub instances directly.
Response data have to be stored somewhere when a task is done so that user can retrieve it afterall. Probably in a Postgres database or in ephemeral memory.
As the tasks and results are being stored, probably they will have to be wiped out after some retention period.
An asynchronous endpoints could be /pub/async/new/job/{job_name}/{job_version}/{job_path} and GET /pub/async/task/{task_id} as opposed to regular /pub/job/{job_name}/{job_version}/{job_path}

TheRacetrack / racetrack

Asynchronous calls to jobs #412