Open sampierson opened 5 years ago
@kbergin @rexwangcc I'd like to close out this parent epic. Since this is a SHOULD, either implement this or leave a comment that you don't plan on doing this and close the ticket.
@jkaneria This one is your team
@jkaneria is this, essentially, a summary issue of https://github.com/HumanCellAtlas/secondary-analysis/issues/370 and https://github.com/HumanCellAtlas/secondary-analysis/issues/371?
Could you please deduplicate these tickets?
Thanks.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.
As a DCP Operator, I want to know not only that a component is up, but that it is nominally functional, so that we may take action if necessary to remedy the situation.
Most components currently only have what we will call a "Level 1" Health Check in place. They have a
/heath
endpoint that will respond if the REST API part of the service is available, but it indicates no more than the REST API part of the service is functioning.A "Level 2" health check tests other parts of the component service and its internal dependencies, to give a clearer indication that the entire component is healthy.
I suspect that it is unlikely that these checks can be completed in real time (i.e. please don't block for ages when
/health
is polled), and that the/health
endpoint will be reporting results of the last time that periodic health checks ran in the background.This ticket was created at the request of the Tech Arch committee, 2019/03/22 meeting.
┆Issue is synchronized with this Jira Story