GenSpectrum / LAPIS

An API, a query engine, and a database schema for genomic sequences; currently with a focus on SARS-CoV-2
https://lapis-three.vercel.app
GNU Affero General Public License v3.0
21 stars 6 forks source link

Healthcheck #813

Closed chaoran-chen closed 3 weeks ago

chaoran-chen commented 4 months ago

Related and complementary to #812, we should have a HEALTHCHECK in Docker to ensure that Docker detects when LAPIS is non-responsive. (Thanks to @theosanderson for suggesting this.)

theosanderson commented 4 months ago

Thanks - it would be great if this health endpoint returned status 200 while LAPIS is waiting for its first data, which other endpoints do not atm (as far as I am aware)

chaoran-chen commented 4 months ago

As soon as https://github.com/GenSpectrum/LAPIS-SILO/issues/244 is implemented, waiting for the first data shouldn't be a problem anymore, though.

theosanderson commented 4 months ago

Yes, great! I wasn't sure of the timing there. In the post-implementation-of-244 case, it may be possible to just use an existing endpoint for this.

pflanze commented 4 months ago

In case curl and Bash aren't working well enough for making the check program, I could extend my api-query program[1] for the purpose. I've verified that I can produce a statically linked binary.

[1] https://github.com/pflanze/api-query/blob/master/src/main.rs

fengelniederhammer commented 4 months ago

We already have some actuator endpoints enabled. That's probably the way we want to go.

For Kubernetes we already set it up for the Loculus backend: From application.properties:

springdoc.show-actuator=true
management.endpoints.enabled-by-default=false
management.endpoint.health.enabled=true
management.endpoints.web.exposure.include=health
management.health.livenessState.enabled=true
management.health.readinessState.enabled=true

From the Kubernetes deployment:

          livenessProbe:
            httpGet:
              path: "/actuator/health/liveness"
              port: 8079
            initialDelaySeconds: 30
            periodSeconds: 10
          readinessProbe:
            httpGet:
              path: "/actuator/health/readiness"
              port: 8079

And then call one of these endpoints in the healthcheck.

pflanze commented 3 months ago

For sending slack messages at a later point, check with me as I already made a separately runnable script for that with extracts from the servers repo, and I'm already due to integrating that back ino a GenSpectrum repo in a way.

fengelniederhammer commented 3 weeks ago

Also see #941 for some more information.