GoogleContainerTools / skaffold

Easy and Repeatable Kubernetes Development
https://skaffold.dev/
Apache License 2.0
15.04k stars 1.62k forks source link

Skaffold not waiting for deployments to stabilize #5966

Closed ananyasaxena closed 1 year ago

ananyasaxena commented 3 years ago

Note: The Skaffold team are unable to reproduce this issue. If you see this issue, please attach a trace from running with -vtrace or provide a sample repository.**

Expected behavior & Actual behavior

On version 1.23.0, skaffold would wait for deployments to stabilize correctly

Deployments stabilized in 2 minutes 43.697 seconds

After v1.24.0, it doesn't seem to be waiting for deployments to stabilize

Waiting for deployments to stabilize...
Deployments stabilized in 16.720773ms

Information

apiVersion: skaffold/v2alpha1
kind: Config
metadata:
  name: spending-api
build:
  tagPolicy:
    gitCommit:
      variant: AbbrevCommitSha
  artifacts:
    - image: XXXX
      docker:
        dockerfile: XXXX
    - image: XXXX
      docker:
        dockerfile: XXXX
deploy:
  statusCheckDeadlineSeconds: 600
  kubectl:
    manifests:
      - k8s-manifest.yaml
tejal29 commented 3 years ago

@ananyasaxena Can you please add some trace logs? Could it be that your application stabilized quickly?

ananyasaxena commented 3 years ago

@tejal29

@ananyasaxena Can you please add some trace logs?

skaffold deploy --kubeconfig ~/.kube/config --kube-context stage --build-artifacts skaffold-tags.json --verbosity trace INFO[0000] Skaffold &{Version:v1.24.0 ConfigVersion:skaffold/v2beta16 GitVersion: GitCommit:XXXX BuildDate:2021-05-11T22:51:04Z GoVersion:go1.14.14 Compiler:gc Platform:linux/amd64 User:} INFO[0000] Loaded Skaffold defaults from "/home/circleci/.skaffold/config" DEBU[0000] config version out of date: upgrading to latest "skaffold/v2beta16" DEBU[0000] parsed 1 configs from configuration file /home/circleci/project/skaffold.yaml DEBU[0000] Defaulting build type to local build
INFO[0000] Activated kube-context "stage"
TRAC[0000] validating yamltags of struct SkaffoldConfig TRAC[0000] validating yamltags of struct Metadata
TRAC[0000] validating yamltags of struct Pipeline
TRAC[0000] validating yamltags of struct BuildConfig
TRAC[0000] validating yamltags of struct Artifact
TRAC[0000] validating yamltags of struct ArtifactType
TRAC[0000] validating yamltags of struct DockerArtifact TRAC[0000] validating yamltags of struct Artifact
TRAC[0000] validating yamltags of struct ArtifactType
TRAC[0000] validating yamltags of struct DockerArtifact TRAC[0000] validating yamltags of struct TagPolicy
TRAC[0000] validating yamltags of struct GitTagger
TRAC[0000] validating yamltags of struct BuildType
TRAC[0000] validating yamltags of struct LocalBuild
TRAC[0000] validating yamltags of struct DeployConfig
TRAC[0000] validating yamltags of struct DeployType
TRAC[0000] validating yamltags of struct KubectlDeploy
TRAC[0000] validating yamltags of struct KubectlFlags
TRAC[0000] validating yamltags of struct LogsConfig
INFO[0000] Using kubectl context: stage
DEBU[0000] Running command: [minikube version --output=json] TRAC[0000] Minikube cluster not detected: starting command minikube version --output=json: exec: "minikube": executable file not found in $PATH TRAC[0000] Minikube cluster not detected: starting command minikube version --output=json: exec: "minikube": executable file not found in $PATH DEBU[0000] setting Docker user agent to skaffold-v1.24.0 DEBU[0000] Using builder: local
DEBU[0000] push value not present in NewBuilder, defaulting to true because cluster.PushImages is true INFO[0000] build concurrency first set to 1 parsed from *local.Builder[0] INFO[0000] final build concurrency value is 1
Tags used in deployment:

Help improve Skaffold with our 2-minute anonymous survey: run 'skaffold survey' To help improve the quality of this product, we collect anonymized usage data for details on what is tracked and how we use this data visit https://skaffold.dev/docs/resources/telemetry/. This data is handled in accordance with our privacy policy https://policies.google.com/privacy

You may choose to opt out of this collection by running the following command: skaffold config set --global collect-metrics false



> Could it be that your application stabilized quickly?

Nope, I can see it taking time on the k8s dashboard and also using v1.23.0 waits correctly for it to stabilize 
gsquared94 commented 3 years ago

@ananyasaxena is it possible to provide a small example to reproduce your issue? I think PR #6010 might help here, but I need a way to test it. Otherwise you can build my branch from source or wait for it to be merged to use the master branch build to verify if that fixes it.

ananyasaxena commented 3 years ago

@gsquared94 I'll try building from your branch, if I run into any challenges I'll just wait for the merge and release to test this out and report back.

gsquared94 commented 3 years ago

@ananyasaxena my PR was merged so you can also try with the bleeding edge version (https://skaffold.dev/docs/install/) For macOS:

curl -Lo skaffold https://storage.googleapis.com/skaffold/builds/latest/skaffold-darwin-amd64 && chmod +x skaffold && sudo mv skaffold /usr/local/bin
tejal29 commented 3 years ago

@ananyasaxena is this a still an issue?

ananyasaxena commented 3 years ago

@gsquared94 @tejal29 I used https://storage.googleapis.com/skaffold/builds/latest/skaffold-linux-amd64 but still ran into the same issue

Waiting for deployments to stabilize...
Deployments stabilized in 24.511647ms
tejal29 commented 3 years ago

Thanks for confirming @ananyasaxena. I will look into this

somnistudio commented 3 years ago

Same problem here

Waiting for deployments to stabilize... Deployments stabilized in 3.306427ms

after i upgrading from v1.27.0 to v1.28.0 On MacOS

nkubala commented 3 years ago

@ananyasaxena @somnistudio we haven't been seeing this in any of our testing, so it seems like this might be specific to your project setups. would either of you be able/willing to provide a small sample project for us to reproduce this issue?

cmdjulian commented 3 years ago

I'm facing the same issue with v1.30.0 and a kustomize deployment.

apiVersion: skaffold/v2beta5
kind: Config
profiles:
  - name: dev-skaffold
    build:
      tagPolicy:
        sha256: { }
      artifacts:
        - image: someDevRegistryWithRepo
          buildpacks:
            builder: paketobuildpacks/builder:base
    deploy:
      kustomize:
        paths: [ k8s/overlays/dev-skaffold ]

  - name: dev-cluster
    deploy:
      kustomize:
        paths: [ k8s/overlays/dev-cluster ]
skaffold -p dev-cluster deploy --images=registry.gitlab.com/XXX --tag=0.7.0 --status-check

Which yields:

Tags used in deployment:
 - registry.gitlab.com/XXX -> registry.gitlab.com/XXX:0.7.0
Starting deploy...
 - configmap/config-4mcf2g2ghh created
 - service/svc created
 - deployment.apps/deployment created
 - ingress.networking.k8s.io/ingress created
Waiting for deployments to stabilize...
Deployments stabilized in 28.180575ms

when using run in favor of deploy everything works as expected. It then yields the following:

skaffold -p dev-cluster run
Generating tags...
Checking cache...
Starting test...
Tags used in deployment:
Starting deploy...
 - configmap/config-4mcf2g2ghh configured
 - service/svc configured
 - deployment.apps/deployment configured
 - ingress.networking.k8s.io/ingress configured
Waiting for deployments to stabilize...
 - default:deployment/deployment: waiting for rollout to finish: 1 old replicas are pending termination...
 - default:deployment/deployment is ready.
Deployments stabilized in 21.166 seconds
You can also run [skaffold run --tail] to get the logs

I also wonder if there is an option to silence this You can also run [skaffold run --tail] to get the logs line, because setting --tail=false doesn't silence it.

briandealwis commented 3 years ago

@cmdjulian could you provide a log running with -vtrace (suitably redacted)?

cmdjulian commented 3 years ago

Sure the log yields the following for skaffold -p dev-cluster deploy --images=registry.gitlab.com/XXX --tag=0.7.0 --status-check=true:

INFO[0000] Loaded Skaffold defaults from "!!REDACTED!!" 
DEBU[0000] config version out of date: upgrading to latest "skaffold/v2beta21" 
DEBU[0000] parsed 1 configs from configuration file !!REDACTED!!/skaffold.yaml 
INFO[0000] applying profile: dev-cluster                    
DEBU[0000] overlaying profile on config for field Build 
DEBU[0000] overlaying profile on config for field artifacts 
DEBU[0000] overlaying profile on config for field insecureRegistries 
DEBU[0000] overlaying profile on config for field tagPolicy 
INFO[0000] no values found in profile for field TagPolicy, using original config values 
DEBU[0000] overlaying profile on config for field BuildType 
INFO[0000] no values found in profile for field BuildType, using original config values 
DEBU[0000] overlaying profile on config for field Test  
DEBU[0000] overlaying profile on config for field Deploy 
DEBU[0000] overlaying profile on config for field DeployType 
DEBU[0000] overlaying profile on config for field -     
DEBU[0000] overlaying profile on config for field helm  
DEBU[0000] overlaying profile on config for field kpt   
DEBU[0000] overlaying profile on config for field kubectl 
DEBU[0000] overlaying profile on config for field kustomize 
DEBU[0000] overlaying profile on config for field statusCheck 
DEBU[0000] overlaying profile on config for field statusCheckDeadlineSeconds 
DEBU[0000] overlaying profile on config for field kubeContext 
DEBU[0000] overlaying profile on config for field logs  
DEBU[0000] overlaying profile on config for field prefix 
DEBU[0000] overlaying profile on config for field PortForward 
DEBU[0000] Defaulting build type to local build         
TRAC[0000] validating yamltags of struct SkaffoldConfig 
TRAC[0000] validating yamltags of struct Metadata       
TRAC[0000] validating yamltags of struct Pipeline       
TRAC[0000] validating yamltags of struct BuildConfig    
TRAC[0000] validating yamltags of struct TagPolicy      
TRAC[0000] validating yamltags of struct GitTagger      
TRAC[0000] validating yamltags of struct BuildType      
TRAC[0000] validating yamltags of struct LocalBuild     
TRAC[0000] validating yamltags of struct DeployConfig   
TRAC[0000] validating yamltags of struct DeployType     
TRAC[0000] validating yamltags of struct KustomizeDeploy 
TRAC[0000] validating yamltags of struct KubectlFlags   
TRAC[0000] validating yamltags of struct DeployHooks    
TRAC[0000] validating yamltags of struct LogsConfig     
INFO[0000] Using kubectl context: !!REDACTED!!                  
DEBU[0000] Running command: [minikube version --output=json] 
TRAC[0000] Minikube cluster not detected: starting command minikube version --output=json: exec: "minikube": executable file not found in $PATH 
TRAC[0000] Minikube cluster not detected: starting command minikube version --output=json: exec: "minikube": executable file not found in $PATH 
DEBU[0000] setting Docker user agent to skaffold-v1.30.0 
DEBU[0000] Using builder: local                         
DEBU[0000] push value not present in NewBuilder, defaulting to true because cluster.PushImages is true 
INFO[0000] build concurrency first set to 0 parsed from *local.Builder[0] 
INFO[0000] final build concurrency value is 0           
Tags used in deployment:
 - registry.gitlab.com/XXX -> registry.gitlab.com/XXX:0.7.1
DEBU[0000] push value not present in isImageLocal(), defaulting to true because cluster.PushImages is true 
Starting deploy...
DEBU[0000] getting client config for kubeContext: `!!REDACTED!!` 
DEBU[0000] Running command: [kubectl version --client -ojson] 
TRAC[0000] latest skaffold version: v1.31.0             
DEBU[0000] Command output: [{
  "clientVersion": {
    "major": "1",
    "minor": "21",
    "gitVersion": "v1.21.3",
    "gitCommit": "ca643a4d1f7bfe34773c74f79527be4afd95bf39",
    "gitTreeState": "archive",
    "buildDate": "2021-07-16T17:16:46Z",
    "goVersion": "go1.16.5",
    "compiler": "gc",
    "platform": "linux/amd64"
  }
}
] 
DEBU[0000] Running command: [kustomize build k8s/overlays/dev-cluster] 
DEBU[0000] Command output: 
!!REDACTED MANIFESTS!!
DEBU[0000] Running command: [kubectl --context !!REDACTED!! get -f - --ignore-not-found -ojson] 
DEBU[0001] Command output: []                           
DEBU[0001] 4 manifests to deploy. 4 are updated or new  
DEBU[0001] Running command: [kubectl --context !!REDACTED!! apply -f -] 
 - configmap/config-4mcf2g2ghh created
 - service/svc created
 - deployment.apps/deployment created
 - ingress.networking.k8s.io/ingress created
INFO[0001] Deploy completed in 1.788 second             
Waiting for deployments to stabilize...
DEBU[0001] getting client config for kubeContext: `!!REDACTED!!` 
Deployments stabilized in 15.963637ms
DEBU[0001] getting client config for kubeContext: `!!REDACTED!!` 

DEBU[0001] exporting metrics
cmdjulian commented 3 years ago

for the run command I'm seeing the following:

nearly same output
...
INFO[0001] Deploy completed in 1.149 second             
Waiting for deployments to stabilize...
DEBU[0001] getting client config for kubeContext: `!!REDACTED!!` 
DEBU[0001] checking status default:deployment/deployment 
DEBU[0002] Running command: [kubectl --context !!REDACTED!! rollout status deployment deployment --namespace default --watch=false] 
DEBU[0002] Command output: [Waiting for deployment "deployment" rollout to finish: 0 of 1 updated replicas are available...
...
loops over and over again
...
DEBU[0021] Pod "deployment-75c9dd798b-znm2m" scheduled but not ready: checking container statuses 
DEBU[0021] Fetching events for pod "deployment-75c9dd798b-znm2m" 
DEBU[0022] Running command: [kubectl --context !!REDACTED!! rollout status deployment deployment --namespace !!REDACTED!! --watch=false] 
DEBU[0022] Command output: [deployment "deployment" successfully rolled out
] 
DEBU[0022] Fetching events for pod "deployment-75c9dd798b-znm2m" 
 - default:deployment/deployment is ready.
Deployments stabilized in 21.183 seconds
DEBU[0022] getting client config for kubeContext: `!!REDACTED!!` 
You can also run [skaffold run --tail] to get the logs
WARN[0022] got unexpected event of type ERROR           

DEBU[0022] exporting metrics
cmdjulian commented 3 years ago

When using the latest version from https://storage.googleapis.com/skaffold/builds/latest/skaffold-linux-amd64 I'm also seeing the following for the deploy task:

Waiting for deployments to stabilize...
DEBU[0001] getting client config for kubeContext: `!!REDACTED!!`  subtask=-1 task=DevLoop
Deployments stabilized in 18.632405ms
INFO[0001] Deploy completed in 1.064 second              subtask=-1 task=Deploy
Waiting for deployments to stabilize...
DEBU[0001] getting client config for kubeContext: `!!REDACTED!!`  subtask=-1 task=DevLoop
Deployments stabilized in 2.88494ms
DEBU[0001] getting client config for kubeContext: `!!REDACTED!!`  subtask=-1 task=DevLoop
WARN[0001] got unexpected event of type ERROR            subtask=-1 task=DevLoop

DEBU[0001] exporting metrics                             subtask=-1 task=DevLoop
DEBU[0001] metrics uploading complete in 804.018095ms    subtask=-1 task=DevLoop

I'm using k3s if this helps and my kustomize version is {Version:4.2.0 GitCommit:$Format:%H$ BuildDate:2021-07-22T22:12:15Z GoOs:linux GoArch:amd64}

simonjpartridge commented 3 years ago

I'm also experiencing this problem using skaffold 1.32.0.

My deployments report being stabilised in a few milliseconds when using skaffold deploy even when the deployments haven't rolled out yet and kubectl rollout status still reports "waiting for rollout to finish".

Strangely skaffold run has the correct behaviour and waits for the deployments to properly stabilise (takes about 20s for us) before reporting success. A failing deployment correctly reports an error when using skaffold run but reports all successful when using skaffold deploy.

Using kubernetes v1.20.9, istio 1.11.2, and kustomize 4.3

simonjpartridge commented 3 years ago

This appears to have been fixed for me in 1.33.0. I suspect the change was https://github.com/GoogleContainerTools/skaffold/pull/6674 which fixed it. Thanks :)

cmdjulian commented 3 years ago

Hey @simonjpartridge, after updating to the newest version 1.33.0 I see normal deployment times. It appears to be fixed for me as well. Thanks

gsquared94 commented 3 years ago

closing for now, please comment to reopen if it reoccurs.

afallou commented 2 years ago

Seeing this issue again with the Skaffold 2.0 release. Pinning to v1.39.2 fixes it.

aaron-prindle commented 1 year ago

@afallou can you add some more information on the skaffold.yaml used when you encountered this (esp. what k8s objects were deployed - Deployment, StatefulSet, etc.) and what deployer was used?) . Also have you been able to try v2.0.3 and is the issue still present there? Thanks

aaron-prindle commented 1 year ago

Here is a sample deployment that this seems to occur for:

# skaffold apply -f skaffold.yaml manifest.yaml 
Starting deploy...
 - deployment.apps/blah-deployment created
Waiting for deployments to stabilize...
Deployments stabilized in 307.114876ms
# kubectl get deployments
NAME                                  READY   UP-TO-DATE   AVAILABLE   AGE
blah-deployment                       0/3     3            0           13s
apiVersion: apps/v1
kind: Deployment
metadata:
 labels:
   app: blah
   skaffold.dev/run-id: static
 name: blah-deployment
spec:
 replicas: 3
 selector:
   matchLabels:
     app: blah
 strategy:
   rollingUpdate:
     maxSurge: 1
     maxUnavailable: 0
   type: RollingUpdate
 template:
   metadata:
     labels:
       app: blah
   spec:
     containers:
     - image: us-east1-docker.pkg.dev/sample-app/sample-repo/hello-app:556538f3-0569-430e-9856-4ca8ed770646
       imagePullPolicy: Always
       name: hello-app
       readinessProbe:
         initialDelaySeconds: 10
         periodSeconds: 30
         tcpSocket:
           port: 80
gsquared94 commented 1 year ago

I could not repro this issue. I tried against pods and deployments and this app mentioned above.

I tried against skaffold main branch, along with v2.0.2 and v2.0.3, and against minikube and GKE clusters. It is possible that this regression existed in v2.0.0 release but I think that release has been archived and the earliest available version is now v2.0.2.

Closing it again. Please provide the exact kubernetes manifest with a prebuilt image that I can pull and run as repro to reopen.

renzodavid9 commented 1 year ago

Making triage party happy