GoogleCloudPlatform / gcs-fuse-csi-driver

The Google Cloud Storage FUSE Container Storage Interface (CSI) Plugin.
Apache License 2.0
119 stars 30 forks source link

Random gcs-fuse-csi-driver crash #283

Closed Jonathan-Eid closed 4 months ago

Jonathan-Eid commented 5 months ago

Grafana pod crashed out with Warning  FailedMount  3m58s (x890 over 30h)  kubelet  MountVolume.SetUp failed for volume "gcs-fuse-csi-ephemeral" : rpc error: code = Internal desc = the webhook failed to inject the sidecar container into the Pod spec

Remedied by rollout restarting deployment, new pod came up just fine.

Cluster Version: 1.28.9-gke.1000000

Pod information:

Name:             grafana-6699cdf5ff-rm4dm
Namespace:        monitoring
Priority:         0
Service Account:  default
Node:             ..............................
Start Time:       Sun, 02 Jun 2024 03:33:41 -0400
Labels:           app.kubernetes.io/instance=grafana
                  app.kubernetes.io/name=grafana
                  pod-template-hash=6699cdf5ff
Annotations:
                  gke-gcsfuse/cpu-limit: 4
                  gke-gcsfuse/cpu-request: 500m
                  gke-gcsfuse/ephemeral-storage-limit: 50Gi
                  gke-gcsfuse/ephemeral-storage-request: 5Gi
                  gke-gcsfuse/memory-limit: 4Gi
                  gke-gcsfuse/memory-request: 1Gi
                  gke-gcsfuse/volumes: true
                  kubectl.kubernetes.io/default-container: grafana
Status:           Pending
SeccompProfile:   RuntimeDefault
IP:               
IPs:              <none>
Controlled By:    ReplicaSet/grafana-6699cdf5ff
Containers:
  grafana:
    Container ID:    
    Image:           docker.io/grafana/grafana:10.4.1
    Image ID:        
    Ports:           3000/TCP, 9094/TCP, 9094/UDP
    Host Ports:      0/TCP, 0/TCP, 0/UDP
    SeccompProfile:  RuntimeDefault
    State:           Waiting
      Reason:        ContainerCreating
    Ready:           False
    Restart Count:   0
    Limits:
      cpu:                500m
      ephemeral-storage:  1Gi
      memory:             2Gi
    Requests:
      cpu:                500m
      ephemeral-storage:  1Gi
      memory:             2Gi
    Liveness:             http-get http://:3000/api/health delay=60s timeout=30s period=10s #success=1 #failure=10
    Readiness:            http-get http://:3000/api/health delay=0s timeout=1s period=10s #success=1 #failure=3
    Environment:
      POD_IP:                       (v1:status.podIP)
      GF_SECURITY_ADMIN_USER:      <set to the key 'admin-user' in secret 'grafana-secret'>      Optional: false
      GF_SECURITY_ADMIN_PASSWORD:  <set to the key 'admin-password' in secret 'grafana-secret'>  Optional: false
      GF_PATHS_DATA:               /grafana/data/
      GF_PATHS_LOGS:               /grafana/logs/
      GF_PATHS_PLUGINS:            /grafana/plugins/
      GF_PATHS_PROVISIONING:       /grafana/provisioning/
    Mounts:
      /etc/grafana/grafana.ini from config (rw,path="grafana.ini")
      /grafana from gcs-fuse-csi-ephemeral (rw)
      /var/lib/grafana from storage (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-bfksz (ro)
Readiness Gates:
  Type                                       Status
  cloud.google.com/load-balancer-neg-ready   True 
Conditions:
  Type                                       Status
  cloud.google.com/load-balancer-neg-ready   True 
  Initialized                                True 
  Ready                                      False 
  ContainersReady                            False 
  PodScheduled                               True 
Volumes:
  config:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      grafana
    Optional:  false
  storage:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     
    SizeLimit:  <unset>
  gcs-fuse-csi-ephemeral:
    Type:              CSI (a Container Storage Interface (CSI) volume source)
    Driver:            gcsfuse.csi.storage.gke.io
    FSType:            
    ReadOnly:          false
    VolumeAttributes:      bucketName=<......>
                           gcsfuseLoggingSeverity=warning
                           mountOptions=implicit-dirs
  kube-api-access-bfksz:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Guaranteed
Node-Selectors:              <none>
Tolerations:                 kubernetes.io/arch=amd64:NoSchedule
                             node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason       Age                    From     Message
  ----     ------       ----                   ----     -------
  Warning  FailedMount  3m58s (x890 over 30h)  kubelet  MountVolume.SetUp failed for volume "gcs-fuse-csi-ephemeral" : rpc error: code = Internal desc = the webhook failed to inject the sidecar container into the Pod spec
hime commented 4 months ago

This error comes from the gcsfuse-node and it means the sidecar container is not present in the pod spec. I would make sure that the webhook is running (if using the non-managed driver) when you encounter errors like these. If you encounter this in the future again and you find the webhook is running, feel free to go through the webhook logs and reopen this issue if the logs aren't useful.