Closed runningman84 closed 3 years ago
Just encountered this in our setup as well. DNS had been temporarily unavailable, therefore it failed to clone the repo. It seems that the readiness/liveness probe did not detect this as a failure, so the pod remained up with issues. Once the pod had been re-created, it started working again.
possible duplicate/related to https://github.com/fluxcd/flux/issues/3014 ?
We have been having this kind of issues since upgrading to 1.19
We've been having this issue as well, on two clusters. One has flux 1.19.0, the other 1.20.0.
Ran into this same issue, had to cycle the pods to get them to start syncing again.
I was experiencing this issue (regularly, at least once a week) on my Okteto Cloud flux deployment, but I upgraded it to v1.21.2 several days ago and haven't seen it again. It would run for days and keep trying to sync, with a failure due to intermittent DNS issue that once the pod saw this issue, it would just fail to clone from then on until restarted.
I don't know of any specific changes that could have fixed it, but unless someone has a current repro with latest version of Flux v1, then I will have to close this.
Describe the bug
After a few days flux got stuck and did not sync anymore. I killeded the pod and everything was up and running again.
To Reproduce
I have no idea... maybe the network connection was brokene for a few minutes or the file system got corrupted...
Expected behavior
flux should terminate if it cannot reach the git repo for a long time like 1 hour. This would allow k8s to restart the container which gives visibility and might also solve the issue.
Logs
Additional context