deis / workflow

The open source PaaS for Kubernetes.
https://deis.com/workflow/
MIT License
1.3k stars 181 forks source link

Pods fail to mount secret on k8s 1.3.0 + GKE #372

Closed mboersma closed 8 years ago

mboersma commented 8 years ago

Builder, database, minio, and registry all mount the "objectstorage-keyfile" secret volume. In k8s 1.3 on GKE, this began to fail (see the final Events listing):

$ kubectl --namespace=deis describe po deis-database-dyqyu 
Name:       deis-database-dyqyu
Namespace:  deis
Node:       gke-mboersma-default-pool-21df92ab-lh7d/10.240.0.16
Start Time: Wed, 13 Jul 2016 15:28:28 -0600
Labels:     app=deis-database
Status:     Pending
IP:     
Controllers:    ReplicationController/deis-database
Containers:
  deis-database:
    Container ID:   
    Image:      quay.io/deisci/postgres:canary
    Image ID:       
    Port:       5432/TCP
    QoS Tier:
      memory:       BestEffort
      cpu:      BestEffort
    State:      Waiting
      Reason:       ContainerCreating
    Ready:      False
    Restart Count:  0
    Readiness:      exec [is_running] delay=30s timeout=1s period=10s #success=1 #failure=3
    Environment Variables:
      DATABASE_STORAGE: minio
Conditions:
  Type      Status
  Initialized   True 
  Ready     False 
  PodScheduled  True 
Volumes:
  database-creds:
    Type:   Secret (a volume populated by a Secret)
    SecretName: database-creds
  objectstore-creds:
    Type:   Secret (a volume populated by a Secret)
    SecretName: objectstorage-keyfile
  deis-database-token-ovk2r:
    Type:   Secret (a volume populated by a Secret)
    SecretName: deis-database-token-ovk2r
Events:
  FirstSeen LastSeen    Count   From                            SubobjectPath   Type        Reason      Message
  --------- --------    -----   ----                            -------------   --------    ------      -------
  4m        4m      1   {default-scheduler }                            Normal      Scheduled   Successfully assigned deis-database-dyqyu to gke-mboersma-default-pool-21df92ab-lh7d
  2m        2m      1   {kubelet gke-mboersma-default-pool-21df92ab-lh7d}           Warning     FailedMount Unable to mount volumes for pod "deis-database-dyqyu_deis(bd127bc0-4940-11e6-af68-42010af001d0)": timeout expired waiting for volumes to attach/mount for pod "deis-database-dyqyu"/"deis". list of unattached/unmounted volumes=[objectstore-creds]
  2m        2m      1   {kubelet gke-mboersma-default-pool-21df92ab-lh7d}           Warning     FailedSync  Error syncing pod, skipping: timeout expired waiting for volumes to attach/mount for pod "deis-database-dyqyu"/"deis". list of unattached/unmounted volumes=[objectstore-creds]

This appears to be related to kubernetes/kubernetes#28750 and maybe kubernetes/kubernetes#28898 and kubernetes/kubernetes#28616.

felixbuenemann commented 8 years ago

I also saw these with k8s 1.3.0 using workflow v2.1.0. First try was with kube-aws 1.3.0 / hyperkube v1.3.0_coreos.0 / CoreOS Alpha w. docker 1.11.2 and second try on kube-aws 1.3.0 / hyperkube v1.3.0_coreos.1 / CoreOS Beta w. docker 1.10.3.

Here's an example of the logged error events taken from the tectonic console:

Error syncing pod, skipping: timeout expired waiting for volumes to attach/mount for pod "slugbuild-edible-magician-f3a78dd4-aae4512e"/"deis". list of unattached/unmounted volumes=[objectstorage-keyfile]
Unable to mount volumes for pod "slugbuild-edible-magician-f3a78dd4-aae4512e_deis(6c69628d-468f-11e6-a66d-029730afa1db)": timeout expired waiting for volumes to attach/mount for pod "slugbuild-edible-magician-f3a78dd4-aae4512e"/"deis". list of unattached/unmounted volumes=[objectstorage-keyfile]
mboersma commented 8 years ago

A workaround in GKE is to choose the "Change" link for your Node Pool and roll it back to k8s 1.2.5.

rimusz commented 8 years ago

Not the issue with Workflow v2.2.0 on GKE with kubernetes v1.3.2 with GC storage. Tried on kube-solo and kube-cluster (minio as storage) , did not get any problems too

felixbuenemann commented 8 years ago

Chances are you just haven't hit the bug yet. The bugfix has been merged in kubernetes master 2 days ago (see kubernetes/kubernetes#28939) and will likely land in k8s 1.3.3. The problem ist triggered when mounting secrets, so it doesn't matter what kind of storage you are using.

rimusz commented 8 years ago

@felixbuenemann you, I got hit by that bug on the forth cluster on GKE

JorritSalverda commented 8 years ago

Just checked with 1.3.2, and there indeed it's still an issue. Will retry later this week when 1.3.3 is available on GKE.

sstarcher commented 8 years ago

Looks like it will land in 1.3.4

bacongobbler commented 8 years ago

Yes, apparently @mboersma received tentative acknowledgement that it will land in 1.3.4.

felixbuenemann commented 8 years ago

If someone wants to try if it's fixed, k8s 1.3.4 has been released a couple of hours ago.

sstarcher commented 8 years ago

I'll be testing this out tomorrow

bacongobbler commented 8 years ago

Yes, I manually tested this and it was fixed with 1.3.4-beta.0.

mboersma commented 8 years ago

k8s 1.3.4 has been released a couple of hours ago

Excellent! Thanks @felixbuenemann we'll test again to make sure.

(I've also manually tested with k8s v1.4.0-beta2, and the bug stayed fixed.)