Open eliottness opened 3 days ago
Hey, thanks for opening an issue.
I know nothing about kubernetes, so I won't be able to do anything kubernetes-related about this. But I'm surprised: the health checkpoint times out, and the rest doesn't? Because they're all tied to the same code, there's no rate limiting or anything.
is this the stack chart? If you check the app pod logs does it give more detail, for example I think mine was having issues referencing the postgres database, and hence why the app pod would never go healthy.
That could be the issue, sure. Could you share some more details?
I've been sorting through a # of issues on this chart recently and hope to help on some of the chart efforts, but I think this particular issues the app pod never goes healthy is if you look a app svc logs, its referncing service firefly-db
which doesn't exist.
k logs -n firefly svc/firefly-iii | grep firefly-db | tail -n 1
[previous exception] [object] (PDOException(code: 0): PDO::__construct(): php_network_getaddresses: getaddrinfo for firefly-db failed: Name or service not known at /var/www/html/vendor/laravel/framework/src/Illuminate/Database/Connectors/Connector.php:65)
The default service names from the chart it doesn't align with what the app pod is looking for which the db is actually named in my case firefly-iii-firefly-db
.
k get svc -n firefly-iii
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
firefly-iii ClusterIP 10.43.88.229 <none> 80/TCP 17m
firefly-iii-firefly-db ClusterIP 10.43.221.9 <none> 5432/TCP 17m
firefly-iii-importer ClusterIP 10.43.191.66 <none> 80/TCP 17m
This app is expecting what is in its env variable, but the stack chart sets a different dbhost I believe. Essentially some inconsistencies in naming of things in different locations.
Support guidelines
I've found a bug and checked that ...
Description
Around 5-10 minutes after a firefly container as started, the
/health
endpoint stops respondingOK
. This leads the kubernetes scheduler to kill the pod and restart it, which slowly leads the pod to transition toCrashloopBackoff
, and thus becoming unavailable.Debug information
Debug information generated at 2024-11-18 20:51:23 for Firefly III version v6.1.22.
en_GB.UTF-8: :white_check_mark:
Expected behaviour
I would expect the endpoint
/health
to continue returningOK
.Steps to reproduce
Unhealthy
k8s events.Additional info
Since there is no relevant logs each times a pod get killed this took me a while to unearth this.
Here is the kubernetes event even through it should be fairly useless:
I can only do wild guesses but this could be a rate limiting issue maybe... ? The only current workaround is editing manually the deployment manifest and remove all health checks.