fluxcd / source-controller

The GitOps Toolkit source management component
https://fluxcd.io
Apache License 2.0
240 stars 187 forks source link

Flux Source Controller Fails to List Remotes #1137

Open devopstagon opened 1 year ago

devopstagon commented 1 year ago

Describe the bug

Source controller randomly has issues listing revisions from the remote(GitLab in this case) leading to these errors:

{"level":"error","ts":"2023-06-20T12:09:39.735Z","msg":"failed to checkout and determine revision: unable to list remote for 'https://gitlab/sre/gitops/sre-flux': stream error: stream ID 3; INTERNAL_ERROR; received from peer","controller":"gitrepository","controllerGroup":"source.toolkit.fluxcd.io","controllerKind":"GitRepository","GitRepository":{"name":"flux-system","namespace":"flux-system"},"namespace":"flux-system","name":"flux-system","reconcileID":"e258ec4f-35e2-48e5-9af2-f7715f7c4cb4","error":"failed to checkout and determine revision: unable to list remote for 'https://gitlab/sre/gitops/sre-flux': stream error: stream ID 3; INTERNAL_ERROR; received from peer"}
{"level":"error","ts":"2023-06-20T12:09:39.766Z","msg":"Reconciler error","controller":"gitrepository","controllerGroup":"source.toolkit.fluxcd.io","controllerKind":"GitRepository","GitRepository":{"name":"flux-system","namespace":"flux-system"},"namespace":"flux-system","name":"flux-system","reconcileID":"e258ec4f-35e2-48e5-9af2-f7715f7c4cb4","error":"failed to checkout and determine revision: unable to list remote for 'https://gitlab/sre/gitops/sre-flux': stream error: stream ID 3; INTERNAL_ERROR; received from peer"}

The endpoint it calls is up and has no connection issues we can see during this period. We suspect it is a bug in net/http due to this ticket: https://github.com/golang/go/issues/51323

Steps to reproduce

  1. add a source
  2. check the logs and see the intermittent failures

Expected behavior

Source controller handles this error via retries or something instead of failing to get around the bug.

Screenshots and recordings

No response

OS / Distro

Kubernetes 1.24.x

Flux version

v0.38.3

Flux check

► checking prerequisites ✗ flux 0.38.3 <2.0.0-rc.5 (new version is available, please upgrade) ✔ Kubernetes 1.24.12-gke.500 >=1.20.6-0 ► checking controllers ✔ helm-controller: deployment ready ► ghcr.io/fluxcd/helm-controller:v0.34.1 ✔ image-automation-controller: deployment ready ► ghcr.io/fluxcd/image-automation-controller:v0.34.1 ✔ image-reflector-controller: deployment ready ► ghcr.io/fluxcd/image-reflector-controller:v0.28.0 ✔ kustomize-controller: deployment ready ► ghcr.io/fluxcd/kustomize-controller:v1.0.0-rc.4 ✔ notification-controller: deployment ready ► ghcr.io/fluxcd/notification-controller:v1.0.0-rc.4 ✔ source-controller: deployment ready ► ghcr.io/fluxcd/source-controller:v1.0.0-rc.5 ► checking crds ✔ alerts.notification.toolkit.fluxcd.io/v1beta2 ✔ buckets.source.toolkit.fluxcd.io/v1beta2 ✔ gitrepositories.source.toolkit.fluxcd.io/v1 ✔ helmcharts.source.toolkit.fluxcd.io/v1beta2 ✔ helmreleases.helm.toolkit.fluxcd.io/v2beta1 ✔ helmrepositories.source.toolkit.fluxcd.io/v1beta2 ✔ imagepolicies.image.toolkit.fluxcd.io/v1beta2 ✔ imagerepositories.image.toolkit.fluxcd.io/v1beta2 ✔ imageupdateautomations.image.toolkit.fluxcd.io/v1beta1 ✔ kustomizations.kustomize.toolkit.fluxcd.io/v1 ✔ ocirepositories.source.toolkit.fluxcd.io/v1beta2 ✔ providers.notification.toolkit.fluxcd.io/v1beta2 ✔ receivers.notification.toolkit.fluxcd.io/v1 ✔ all checks passed

Git provider

GitLab

Container Registry provider

Harbor

Additional context

No response

Code of Conduct

makkes commented 1 year ago

According to this comment, the internal error message you're seeing is coming from the server, so it is most likely to be an upstream issue.

savisaar2 commented 9 months ago

@devopstagon Did you manage to solve this issue, I have started seeing this error appear on my cluster coming from source-controller. Unsure why its having a problem.

tomaaron commented 2 months ago

We're experiencing the same since a couple of days on GitHub as source.

stefanprodan commented 2 months ago

Flux retries when the connection fails, it’s not much we can do about if GitHub has connectivity issues. See https://www.githubstatus.com/incidents/r3x7x31k7nn1