GoogleContainerTools / skaffold

Easy and Repeatable Kubernetes Development
https://skaffold.dev/
Apache License 2.0
15.05k stars 1.62k forks source link

[BUG] Config dependency doesn't work when deployers are different #9412

Open mecampbellsoup opened 6 months ago

mecampbellsoup commented 6 months ago

I am trying to marry 2 skaffold files we have, one called cloud-app which builds our app and deploys it to k8s target cluster using helm deployer, and another called cloud-app-e2e-test-runner which is a simple playwright e2e test suite that we typically have deployed using kubectl deployer.

Here is my desired skaffold.yaml where I want to use cloud-app skaffold as a dependent configuration:

apiVersion: skaffold/v4beta9
kind: Config
metadata:
  name: cloud-app-e2e-test-runner
requires:
  - path: ../../../k8s-services/dev/skaffold-cloud-app.yaml
    configs: ["cloud-app"]
    activeProfiles:
      - name: production
build:
  artifacts:
    - image: cloud-app-e2e-test-runner
      context: ./
      docker:
        dockerfile: Dockerfile
        noCache: false
      sync:
        manual:
          - src: "tests/**"
            dest: .
          - src: "*.ts"
            dest: .
          - src: "package.json"
            dest: .
          - src: "playwright.config.ts"
            dest: .
deploy:
  kubectl: {}

The build and deploy steps in cloud-app config succeed, but then when it comes time to do the cloud-app-e2e-test-runner skaffold config, I get an error:

Deployments stabilized in 57.127 seconds
Cleaning up...
 - No resources found
release "cloud-app" uninstalled
nothing to deploy

Possibly related to https://github.com/GoogleContainerTools/skaffold/issues/8745.

Full stdout logs

(⎈|default:cloud)mcampbell-1➜  ~/github/coreweave/cloud-app/cloud_app/e2e : mc/3524/e2e-ci ✘ :✹✚✭ ᐅ  skaffold dev --default-repo registry.gitlab.com/coreweave/cloud-app/dev
Generating tags...
 - api-gateway -> registry.gitlab.com/coreweave/cloud-app/dev/api-gateway:2024-05-07_17-24-03.405_UTC
 - cloud-app -> registry.gitlab.com/coreweave/cloud-app/dev/cloud-app:2024-05-07_17-24-03.405_UTC
 - cloud-app-e2e-test-runner -> registry.gitlab.com/coreweave/cloud-app/dev/cloud-app-e2e-test-runner:v1.50.0-4253-g47247432-dirty
Checking cache...
 - api-gateway: Found. Tagging
 - cloud-app: Found. Tagging
 - cloud-app-e2e-test-runner: Found Remotely
Tags used in deployment:
 - api-gateway -> registry.gitlab.com/coreweave/cloud-app/dev/api-gateway:2024-05-07_17-24-03.405_UTC@sha256:2dc379cc22227cecb86afce97a28c9c5b2fb188d0b8e32a3097cb9d71b6e4fb9
 - cloud-app -> registry.gitlab.com/coreweave/cloud-app/dev/cloud-app:2024-05-07_17-24-03.405_UTC@sha256:c6afa54660939f9eac9b16ead2f050e0dcae770dd2d4fea5450f65731fa45002
 - cloud-app-e2e-test-runner -> registry.gitlab.com/coreweave/cloud-app/dev/cloud-app-e2e-test-runner:v1.50.0-4253-g47247432-dirty@sha256:fc281427f9ebefd19ed5a23ac4e8e51df18ab2abbd04ea8d046c6a2d7868af72
Starting deploy...
Helm release cloud-app not installed. Installing...
NAME: cloud-app
LAST DEPLOYED: Tue May  7 17:24:07 2024
NAMESPACE: cloud
STATUS: deployed
REVISION: 1
TEST SUITE: None
Waiting for deployments to stabilize...
I0507 17:24:13.927584 2115767 request.go:697] Waited for 1.132951171s due to client-side throttling, not priority and fairness, request: GET:https://10.100.1.2:32500/api/v1/namespaces/cloud/pods?labelSelector=app%3Dcloud-app%2Capp.kubernetes.io%2Finstance%3Dcloud-app%2Capp.kubernetes.io%2Fname%3Dcloud-app%2Cskaffold.dev%2Frun-id%3De730be1b-03ab-487a-9721-103f3556957b
 - cloud:deployment/cloud-app: waiting for rollout to finish: 0 of 1 updated replicas are available...
 - cloud:deployment/cloud-app-alerts-notifier: waiting for rollout to finish: 0 of 1 updated replicas are available...
 - cloud:deployment/cloud-app-kubernetes-ingress: Startup probe failed: Get "http://10.241.126.195:1042/healthz": dial tcp 10.241.126.195:1042: connect: connection refused
    - cloud:pod/cloud-app-kubernetes-ingress-89b9566d6-xtnrh: Startup probe failed: Get "http://10.241.126.195:1042/healthz": dial tcp 10.241.126.195:1042: connect: connection refused
 - cloud:deployment/cloud-app-metrics: waiting for rollout to finish: 0 of 1 updated replicas are available...
 - cloud:deployment/cloud-app-reconciler: waiting for init container cloud-app-reconciler-pg-waiter to start
    - cloud:pod/cloud-app-reconciler-59d4b98b4d-8gvgn: waiting for init container cloud-app-reconciler-pg-waiter to start
 - cloud:deployment/cloud-app-worker-auth: waiting for rollout to finish: 0 of 1 updated replicas are available...
 - cloud:deployment/cloud-app-worker-chargify: waiting for init container cloud-app-worker-chargify-pg-waiter to start
    - cloud:pod/cloud-app-worker-chargify-65b7d9765-b669r: waiting for init container cloud-app-worker-chargify-pg-waiter to start
 - cloud:deployment/cloud-app-worker-messaging: waiting for rollout to finish: 0 of 1 updated replicas are available...
 - cloud:deployment/cloud-app-worker-sift: waiting for rollout to finish: 0 of 1 updated replicas are available...
 - cloud:statefulset/cloud-app-postgresql: Waiting for 1 pods to be ready...
 - cloud:deployment/cloud-app-worker-auth: container cloud-app-worker-auth-pg-waiter in error: &ContainerStateWaiting{Reason:CreateContainerConfigError,Message:secret "postgres-role-postgres" not found,}
    - cloud:pod/cloud-app-worker-auth-65cbb6f9bd-ncr22: container cloud-app-worker-auth-pg-waiter in error: &ContainerStateWaiting{Reason:CreateContainerConfigError,Message:secret "postgres-role-postgres" not found,}
I0507 17:24:23.927790 2115767 request.go:697] Waited for 1.987295127s due to client-side throttling, not priority and fairness, request: GET:https://10.100.1.2:32500/api/v1/namespaces/cloud/events?fieldSelector=involvedObject.name%3Dcloud-app-95b66b8bc-pd4hz%2CinvolvedObject.namespace%3Dcloud%2CinvolvedObject.kind%3DPod%2CinvolvedObject.uid%3Dd5a7d1b1-4a38-436b-8ed0-d59704070cb3
 - cloud:statefulset/cloud-app-postgresql is ready. [9/10 deployment(s) still pending]
 - cloud:deployment/cloud-app-reconciler is ready. [8/10 deployment(s) still pending]
 - cloud:deployment/cloud-app-worker-messaging is ready. [7/10 deployment(s) still pending]
 - cloud:deployment/cloud-app-alerts-notifier is ready. [6/10 deployment(s) still pending]
I0507 17:24:34.127911 2115767 request.go:697] Waited for 1.302943837s due to client-side throttling, not priority and fairness, request: GET:https://10.100.1.2:32500/api/v1/namespaces/cloud/pods?labelSelector=app%3Dqueue-worker-auth%2Capp.kubernetes.io%2Finstance%3Dcloud-app-worker-auth%2Capp.kubernetes.io%2Fname%3Dcloud-app-worker-auth%2Cskaffold.dev%2Frun-id%3De730be1b-03ab-487a-9721-103f3556957b
 - cloud:deployment/cloud-app-worker-chargify is ready. [5/10 deployment(s) still pending]
 - cloud:deployment/cloud-app-worker-sift is ready. [4/10 deployment(s) still pending]
 - cloud:deployment/cloud-app-kubernetes-ingress is ready. [3/10 deployment(s) still pending]
 - cloud:deployment/cloud-app-worker-auth is ready. [2/10 deployment(s) still pending]
 - cloud:deployment/cloud-app is ready. [1/10 deployment(s) still pending]
 - cloud:deployment/cloud-app-metrics is ready.
Deployments stabilized in 57.127 seconds
Cleaning up...
 - No resources found
release "cloud-app" uninstalled
nothing to deploy
idsulik commented 6 months ago

it outputs error:
- cloud:pod/cloud-app-worker-auth-65cbb6f9bd-ncr22: container cloud-app-worker-auth-pg-waiter in error: &ContainerStateWaiting{Reason:CreateContainerConfigError,Message:secret "postgres-role-postgres" not found,}.
I think you forgot to add this role

mecampbellsoup commented 6 months ago

it outputs error: - cloud:pod/cloud-app-worker-auth-65cbb6f9bd-ncr22: container cloud-app-worker-auth-pg-waiter in error: &ContainerStateWaiting{Reason:CreateContainerConfigError,Message:secret "postgres-role-postgres" not found,}. I think you forgot to add this role

Thanks for the suggestion, but unfortunately that isn't related.

(That error happens because this app applies sealed-secrets - so when I initially helm install, there's a bit of time lag in which our sealed secrets are applied in the cluster; and then the sealed secrets operator converts them into k8s secrets. Until that process is complete (usually just a few seconds) however we get those errors, but once the secrets are ready helm can complete installation.)

For the purposes of this issue, the only relevant log output is really:

Deployments stabilized in 57.127 seconds
Cleaning up...
 - No resources found