elixir-cloud-aai / cloud-registry

GA4GH Service Registry API implementation for the ELIXIR Cloud
Apache License 2.0
4 stars 1 forks source link

Asynchronous health checks #13

Open uniqueg opened 4 years ago

uniqueg commented 4 years ago

To give clients an idea of the stability of a given services, an (optional) daemon could be implemented in this service that periodically sends heartbeat requests to individual services (e.g., to their /GET service-info endpoints). In order to provide this information to clients effectively, the ExternalService schema could be extended with an object property that provides some or all of the following (and possibly more) information:

The frequency of heartbeats (and timeout!) is probably something that the admin of the cloud registry should set up in the app configuration.

uniqueg commented 3 years ago

Eventually this is probably something that should be discussed with the GA4GH to be implemented globally in the specs. However, for now I think this can be implemented in a relatively simple way:

uniqueg commented 3 years ago

FYI, related discussion at GA4GH, but nothing concrete, so would go ahead as outlined

uniqueg commented 3 years ago

cwl-WES has an implementation of a daemon that runs tasks asynchronously in the background, although for a very different purpose. Perhaps there is a Python API for running something like cron jobs... In any case, it's important that these background checks are scalable over hundreds or even thousands (but certainly not millions) of services, so heartbeat frequency should probably have a reasonable minimum value of once every 30 minutes or so, with a max timeout of 3 seconds.

uniqueg commented 3 years ago

Related issue #20, could be implemented in coordination with this issue