Kong / gateway-operator

Kubernetes Operator for Kong Gateways
Apache License 2.0
50 stars 15 forks source link

Integration & E2E tests fail due to exceeding DockerHub pull quota #452

Open pmalek opened 3 months ago

pmalek commented 3 months ago

Problem statement

Rarely we observe failed E2E test runs which do not exhibit any particular symptoms on the surface, e.g. https://github.com/Kong/gateway-operator/actions/runs/10179601076/job/28155676111

    test_helm_install_upgrade.go:390: Deployment 538fc6a5-e3b1-4855-afc8-946870a96092/kgo-nightly-to-e2e-4ca1-gateway-operator-controller-manager has no AvailableReplicas
    test_helm_install_upgrade.go:390: Failed to get logs from operator pod 538fc6a5-e3b1-4855-afc8-946870a96092/kgo-nightly-to-e2e-4ca1-gateway-operator-controller-manages77wt: container "manager" in pod "kgo-nightly-to-e2e-4ca1-gateway-operator-controller-manages77wt" is waiting to start: trying and failing to pull image
    test_helm_install_upgrade.go:390: 
            Error Trace:    /home/runner/work/gateway-operator/gateway-operator/test/e2e/test_helm_install_upgrade.go:390
            Error:          Received unexpected error:
                            timed out waiting for operator deployment in namespace 538fc6a5-e3b1-4855-afc8-946870a96092
            Test:           TestE2E/TestHelmUpgrade/upgrade_from_nightly_to_current
TestE2E/TestHelmUpgrade/upgrade_from_nightly_to_current 2024-07-31T12:05:46Z logger.go:66: Running command helm with args [uninstall --namespace 538fc6a5-e3b1-4855-afc8-946870a96092 kgo-nightly-to-e2e-4ca1]
TestE2E/TestHelmUpgrade/upgrade_from_nightly_to_current 2024-07-31T12:05:47Z logger.go:66: release "kgo-nightly-to-e2e-4ca1" uninstalled

After closer inspects in diagnostics it occurs that the problem is in exceeding the docker hub pull quota:

- apiVersion: v1
  count: 3
  eventTime: null
  firstTimestamp: "2024-07-31T12:02:48Z"
  involvedObject:
    apiVersion: v1
    fieldPath: spec.containers{manager}
    kind: Pod
    name: kgo-nightly-to-e2e-4ca1-gateway-operator-controller-manages77wt
    namespace: 538fc6a5-e3b1-4855-afc8-946870a96092
    resourceVersion: "2329"
    uid: d713fa65-d7cf-4ab0-b2bf-87fa039e0a57
  kind: Event
  lastTimestamp: "2024-07-31T12:03:36Z"
  message: 'Failed to pull image "docker.io/kong/gateway-operator-oss:nightly": failed
    to pull and unpack image "docker.io/kong/gateway-operator-oss:nightly": failed
    to copy: httpReadSeeker: failed open: unexpected status code https://registry-1.docker.io/v2/kong/gateway-operator-oss/manifests/sha256:984c624b124aa5d10d5f5c2cf4915e14d5293d930203c8f00228accd736e33ed:
    429 Too Many Requests - Server message: toomanyrequests: You have reached your
    pull rate limit. You may increase the limit by authenticating and upgrading: https://www.docker.com/increase-rate-limit'
  metadata:
    creationTimestamp: "2024-07-31T12:02:48Z"
    name: kgo-nightly-to-e2e-4ca1-gateway-operator-controller-manages77wt.17e74a90ef43e1a6
    namespace: 538fc6a5-e3b1-4855-afc8-946870a96092
    resourceVersion: "2431"
    uid: c31178bb-13d7-45fb-8e33-2f92dd60e75a
  reason: Failed
  reportingComponent: kubelet
  reportingInstance: 13bd010b-62c4-4262-80fe-ab572c485c89-control-plane
  source:
    component: kubelet
    host: 13bd010b-62c4-4262-80fe-ab572c485c89-control-plane
  type: Warning

Proposed solution

Acceptance criteria

pmalek commented 2 months ago

Also observed in integration tests suite: https://github.com/Kong/gateway-operator/actions/runs/10400617995/job/28801529440?pr=497

Message:             Failed to pull image "kong:3.7": failed to pull and unpack image "docker.io/library/kong:3.7": failed to copy: httpReadSeeker: failed open: unexpected status code https://registry-1.docker.io/v2/library/kong/manifests/sha256:1ab5941fbe393fd7fef0f64b346f5738334cb269fbbd47ce8142a859f93b3405: 429 Too Many Requests - Server message: toomanyrequests: You have reached your pull rate limit. You may increase the limit by authenticating and upgrading: https://www.docker.com/increase-rate-limit