Closed kbergin closed 5 years ago
Clarification: This also includes a Route 53 metric which verifies that the endpoint is available from multiple regions and also alerts upon service being down
@rhiananthony and @tburdett - Can this be closed in favor of Provide uptime monitoring and downtime alerting of all user facing components and its children?
From @hannes-ucsc in #248: Health check should return 503 status if any components are down. The 503 should still carry a JSON body, same as a 200 response.
Can this be closed in favor of #65?
See my comment in #65 -
This initial project was completed. This ticket is superseded by #282 which details additional components that need health checks. Closing.
A discussion needs to happen about what should be done here.
Notes from call discussion: In Broad, every service has a self check route, which will do sanity checks to see if all things are still up. Andrey thinks this is useful for all of our services to have.
Pingdom was something mentioned at the brainstorming sesh.
https://metrics.data.humancellatlas.org/d/v4-0_FWiz/dcp-health?orgId=1
Old trello ticket https://trello.com/c/IGNBs8t1/19-api-health-checks