CDCgov / prime-simplereport

SimpleReport is a fast, free, and easy way for COVID-19 testing facilities to report results to public health departments.
https://simplereport.gov
Creative Commons Zero v1.0 Universal
55 stars 58 forks source link

Add health endpoint to handle degraded Okta status #7806

Open mpbrown opened 3 weeks ago

mpbrown commented 3 weeks ago

Description

The Spring actuator has a number of health endpoints. Azure uses these to determine the healthiness of the app. One of these endpoints backend-and-db-smoke-test we use for the smoke test to determine if the frontend can correctly connect with the backend. However, that smoke test endpoint is also currently set up to test whether Okta is up. We want to move the Okta check out of that endpoint into a separate endpoint so that we can have more nuanced information about the status of Okta.

If Okta is down, our app can still work for anyone already logged in. However, the smoke test endpoint would return an error status because Okta is down and this response would be interpreted by Azure as the app being unhealthy, causing the Azure app service to spin up a new instance of the app. We don't want that to happen, but we would like to know when Okta might be down.

Proposed solution

Remove the Okta status check from the backend db smoke test endpoint

Create a new health endpoint that checks the Okta status and either returns that it is successfully up or has degraded status.

The degraded status code should be between 200-299 (inclusive), otherwise App Service will determine it's unhealthy and remove it from the load balancer.

Decide whether we should return 200 OK with additional content in the body saying that Okta is degraded or return 204 No Content that we would understand as degraded status (considerations to avoid using custom unknown status codes)

Additional context

June 6, 2024 engineering sync discussion of how this was discovered during smoke test work

What Azure App Service does with Health checks