department-of-veterans-affairs / notification-api

Notification API
MIT License
16 stars 9 forks source link

Implement ECS health check in api task definition #1966

Closed coreycarvalho closed 1 month ago

coreycarvalho commented 3 months ago

User Story - Business Need

In order to have reliable deploys, and only create ECS tasks when they are healthy, we should ensure that we have health checks in place for our ECS containers, specifically our API container.

User Story(ies)

As a VA Notify developer I want container level health checks for our API task So that we can have reliable, healthy deployments

Additional Info and Resources

At a high level, ECS health checks work in two ways:

  1. The target group performs health checks to the task on a specified port and path
  2. The ECS service will perform container based health checks before marking the task as HEALTHY

We currently implement target group health checks for API (number 1 from above), but we do not implement ECS container based health checks (number 2 from above). This means that tasks API tasks are marked as healthy prematurely. We want to implement a health check in the task definition so that the API task is marked healthy once its container check passes.

https://docs.aws.amazon.com/AmazonECS/latest/APIReference/API_HealthCheck.html https://docs.aws.amazon.com/AmazonECS/latest/developerguide/healthcheck.html

We already do something like this for Celery (https://github.com/department-of-veterans-affairs/notification-api/blob/main/cd/application-deployment/prod/vaec-celery-task-definition.json#L280)

Acceptance Criteria

QA Considerations

Potential Dependencies

npmartin-oddball commented 2 months ago

Hey team! Please add your planning poker estimate with Zenhub @coreycarvalho @cris-oddball @EvanParish @k-macmillan @kalbfled @MackHalliday @mchlwellman

npmartin-oddball commented 1 month ago

Keeping. Confirming that Flask is still running.

mchlwellman commented 1 month ago

Yesterday, I started looking at wait-for-it, since we have it in our repo but then realized it was doing the same thing the ECS health check pinger was doing, so I switched to curl, which means including it in the container (it's not there).