Pyrrha / calcom-helm

Book your meetings using Cal.com, running with Helm ⎈
15 stars 10 forks source link

liveness/readiness checks not working and can't disable. #10

Open danjenkins opened 10 months ago

danjenkins commented 10 months ago

I was able to briefly see the admin setup screen for cal.com self hosted but then the liveness/readiness checks failed.

  Normal   Pulled     51m (x73 over 4h54m)      kubelet  Container image "calcom/cal.com:v3.4.7" already present on machine
  Warning  Unhealthy  16m (x83 over 4h54m)      kubelet  Liveness probe failed: Get "http://10.244.1.140:3000/": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
  Warning  BackOff    6m29s (x1041 over 4h49m)  kubelet  Back-off restarting failed container
  Warning  Unhealthy  86s (x88 over 4h54m)      kubelet  Readiness probe failed: Get "http://10.244.1.140:3000/": context deadline exceeded (Client.Timeout exceeded while awaiting headers)

But I could definitely get to the UI via the ingress before the container was killed.

I see you can't disable the liveness/readiness checks in the chart. Maybe that should be an option?

Not sure why they're failing yet.

Pyrrha commented 9 months ago

I just redeploy an instance to check for this, and it runs without problem from initialisation until meeting creations in the app, without rebooting. However, I also encountered some probes failures, only at the startup of the app. using calcom-stack-0.1.2:

Events:
  Type     Reason     Age                From               Message
  ----     ------     ----               ----               -------
  Normal   Pulling    15m                kubelet            Pulling image "calcom/cal.com:v3.5.4"
  Normal   Pulled     14m                kubelet            Successfully pulled image "calcom/cal.com:v3.5.4" in 1m8.501s (1m8.501s including waiting)
  Normal   Created    14m                kubelet            Created container calcom
  Normal   Started    14m                kubelet            Started container calcom
  Warning  Unhealthy  13m (x3 over 14m)  kubelet            Readiness probe failed: Get "http://10.0.174.18:3000/": dial tcp 10.0.174.18:3000: connect: connection refused
  Warning  Unhealthy  13m                kubelet            Liveness probe failed: Get "http://10.0.174.18:3000/": dial tcp 10.0.174.18:3000: connect: connection refused

From my point of view, there's a problem: the delay for the app to start is higher (especially at first startup with database initialisation) and can be over 2 minutes, according to my logs. I'll investigate on adding a startup probe.

For your scenario, I don't understand why it takes so much time to start. Can you send the pod logs for further investigation? Also, how much resources did you allow for the app?

danjenkins commented 9 months ago

I'll get back to you with some logs, I went and removed the probes to make things come up fine to get past the issue temporarily.

Resources - I didn't limit what it had available I think. Will confirm

Pyrrha commented 9 months ago

@danjenkins any updates about your performance issue?

danjenkins commented 8 months ago

Sorry due to illness/christmas I havent gotten around to looking at this again. Will do soon!

Pyrrha commented 7 months ago

@danjenkins I'll close this issue for now. Feel free to comment it with fresh updates, I'll be happy to re-open it to investigate.

danjenkins commented 7 months ago

No worries - I'm going to be moving my k8s cluster in the next week or so so I'll test these changes then!

erueda1 commented 1 month ago

@Pyrrha Do you think it will be possible to add the startup probe? 30s doesn't seem to be enough to finish the startup of the application even without resource limitation applied.

Thank you.

Pyrrha commented 2 weeks ago

Hello @erueda1, As said to @danjenkins, it looks weird to me that the app startup in a so long time. Could you provide me a benchmark presenting mean time for startup, and a summary of your infrastructure? I'll conduct same on my side too. This probe will be implemented soon, next feature release I think.