Open maargenton opened 1 year ago
what happens when you reconcile the GitRepository manually by running flux reconcile source git <gitrepo-name>
?
I tired that before; it was hanging as well, on ◎ waiting for GitRepository reconciliation
.
I rebooted my router, which killed the hanging connection and generated two error messages with stack-trace; maybe that can help:
{
"level": "error",
"ts": "2023-07-05T08:16:47.392Z",
"msg": "failed to checkout and determine revision: unable to list remote for 'ssh://git@github.com/...': ssh: handshake failed: read tcp 10.42.0.15:56574->140.82.114.4:22: read: connection reset by peer",
"controller": "gitrepository",
"controllerGroup": "source.toolkit.fluxcd.io",
"controllerKind": "GitRepository",
"GitRepository": {
"name": "flux-system",
"namespace": "flux-system"
},
"namespace": "flux-system",
"name": "flux-system",
"reconcileID": "b19c779d-28aa-4163-aa59-1cb7ed4f3373",
"error": "failed to checkout and determine revision: unable to list remote for 'ssh://git@github.com/...': ssh: handshake failed: read tcp 10.42.0.15:56574->140.82.114.4:22: read: connection reset by peer",
"stacktrace": "github.com/fluxcd/source-controller/internal/reconcile/summarize.logError\n\tgithub.com/fluxcd/source-controller/internal/reconcile/summarize/processor.go:99\ngithub.com/fluxcd/source-controller/internal/reconcile/summarize.ErrorActionHandler\n\tgithub.com/fluxcd/source-controller/internal/reconcile/summarize/processor.go:77\ngithub.com/fluxcd/source-controller/internal/reconcile/summarize.(*Helper).SummarizeAndPatch\n\tgithub.com/fluxcd/source-controller/internal/reconcile/summarize/summary.go:193\ngithub.com/fluxcd/source-controller/internal/controller.(*GitRepositoryReconciler).Reconcile.func1\n\tgithub.com/fluxcd/source-controller/internal/controller/gitrepository_controller.go:204\ngithub.com/fluxcd/source-controller/internal/controller.(*GitRepositoryReconciler).Reconcile\n\tgithub.com/fluxcd/source-controller/internal/controller/gitrepository_controller.go:240\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\tsigs.k8s.io/controller-runtime@v0.15.0/pkg/internal/controller/controller.go:118\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\tsigs.k8s.io/controller-runtime@v0.15.0/pkg/internal/controller/controller.go:314\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\tsigs.k8s.io/controller-runtime@v0.15.0/pkg/internal/controller/controller.go:265\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\tsigs.k8s.io/controller-runtime@v0.15.0/pkg/internal/controller/controller.go:226"
}
{
"level": "debug",
"ts": "2023-07-05T08:16:47.393Z",
"logger": "events",
"msg": "failed to checkout and determine revision: unable to list remote for 'ssh://git@github.com/...': ssh: handshake failed: read tcp 10.42.0.15:56574->140.82.114.4:22: read: connection reset by peer",
"type": "Warning",
"object": {
"kind": "GitRepository",
"namespace": "flux-system",
"name": "flux-system",
"uid": "df67c776-a9e3-4d82-8534-30823b917661",
"apiVersion": "source.toolkit.fluxcd.io/v1",
"resourceVersion": "293766"
},
"reason": "GitOperationFailed"
}
{
"level": "error",
"ts": "2023-07-05T08:16:47.412Z",
"msg": "Reconciler error",
"controller": "gitrepository",
"controllerGroup": "source.toolkit.fluxcd.io",
"controllerKind": "GitRepository",
"GitRepository": {
"name": "flux-system",
"namespace": "flux-system"
},
"namespace": "flux-system",
"name": "flux-system",
"reconcileID": "b19c779d-28aa-4163-aa59-1cb7ed4f3373",
"error": "failed to checkout and determine revision: unable to list remote for 'ssh://git@github.com/...': ssh: handshake failed: read tcp 10.42.0.15:56574->140.82.114.4:22: read: connection reset by peer",
"stacktrace": "sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\tsigs.k8s.io/controller-runtime@v0.15.0/pkg/internal/controller/controller.go:324\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\tsigs.k8s.io/controller-runtime@v0.15.0/pkg/internal/controller/controller.go:265\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\tsigs.k8s.io/controller-runtime@v0.15.0/pkg/internal/controller/controller.go:226"
}
ssh: handshake failed: read tcp 10.42.0.15:56574->140.82.114.4:22: read: connection reset by peer
this error combined with the fact that the thread essentially gets stuck leads me to believe that this issue is the result of connection issues where the connection just gets stuck forever without completing or terminating and then when the router is rebooted the connection is dropped
That sounds like a reasonable explanation. But shouldn't that be covered by the default 60s timeout?
I have the
source-controller
configured to watch a single git repo over ssh, with an interval of 1 minute and no explicit timeout (should default to 60s). After a little while (about 10 minutes since reboot in my latest case), the source controller stops checking the repo, stops logging anything (logging bumped to debug to investigate), and never recovers from that state.The
kustomize-controller
, configured to reconcile every 10 minutes keeps working / logging properly, but never sees any update after that point.from http://...:8080/metrics:
Additional context:
vagrant up
I'll be happy to provide any further details if needed. Please let me know how I can help resolve this issue.
Thanks