HumanCellAtlas / ingest-central

Ingest Central is the hub repository for the ingest service
Apache License 2.0
0 stars 1 forks source link

Components SHOULD implement Level 2 Health Checks #384

Open sampierson opened 5 years ago

sampierson commented 5 years ago

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.

As a DCP Operator, I want to know not only that a component is up, but that it is nominally functional, so that we may take action if necessary to remedy the situation.

Most components currently only have what we will call a "Level 1" Health Check in place. They have a /heath endpoint that will respond if the REST API part of the service is available, but it indicates no more than the REST API part of the service is functioning.

A "Level 2" health check tests other parts of the component service and its internal dependencies, to give a clearer indication that the entire component is healthy.

I suspect that it is unlikely that these checks can be completed in real time (i.e. please don't block for ages when /health is polled), and that the /health endpoint will be reporting results of the last time that periodic health checks ran in the background.

This ticket was created at the request of the Tech Arch committee, 2019/03/22 meeting.

mweiden commented 5 years ago

@justincc any plans to complete this work? I'd like to close out the parent epic. Since this is a SHOULD, either implement this or leave a comment that you don't plan on doing this and close the ticket.

justincc commented 5 years ago

I want do this but I have no timeline to do this work due to other priorities except that at least some will probably happen in Q3.