aws / karpenter-provider-aws

Karpenter is a Kubernetes Node Autoscaler built for flexibility, performance, and simplicity.
https://karpenter.sh
Apache License 2.0
6.83k stars 960 forks source link

MountVolume.SetUp failed for volume "test-file" : object "default"/"test-file" not registered #3755

Closed hvikharev closed 1 year ago

hvikharev commented 1 year ago

Version

Karpenter Version: v0.27.1

Kubernetes Version: v1.23.17

Expected Behavior

We run Karpenter on EKS and run deployment (pod + volume from configmap). Karpenter should create node, run pod and attach volume to the pod.

Actual Behavior

The kubelet can't attach volumes with kube-root-ca.crt and our own files in first attempt:

Events:
  Type     Reason            Age                From               Message
  ----     ------            ----               ----               -------
  Warning  FailedScheduling  60s                default-scheduler  0/2 nodes are available: 2 node(s) didn't match Pod's node affinity/selector.
  Normal   Nominated         57s                karpenter          Pod should schedule on node: ip-10-158-46-62.ec2.internal
  Normal   Scheduled         24s                default-scheduler  Successfully assigned default/nginx-6688cfb6dd-cjz9q to ip-10-158-46-62.ec2.internal
  Warning  NetworkNotReady   16s (x5 over 24s)  kubelet            network is not ready: container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized
  Warning  FailedMount       16s (x5 over 24s)  kubelet            MountVolume.SetUp failed for volume "test-file" : object "default"/"test-file" not registered
  Warning  FailedMount       16s (x5 over 24s)  kubelet            MountVolume.SetUp failed for volume "kube-api-access-l5f52" : object "default"/"kube-root-ca.crt" not registered
  Normal   Pulling           8s                kubelet            Pulling image "nginx:1.14.2"
  Normal   Pulled            6s                kubelet            Successfully pulled image "nginx:1.14.2" in 2.103958447s (2.103972092s including waiting)
  Normal   Created           6s                kubelet            Created container nginx
  Normal   Started           6s                kubelet            Started container nginx

CNI plugin: v1.12.6-eksbuild.1 runtime: containerd amiFamily: Bottlerocket and AL2

Steps to Reproduce the Problem

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx
  labels:
    app: nginx
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:latest
        resources:
          requests:
            cpu: "1"
        ports:
        - containerPort: 80
        volumeMounts:
          - name: test-file
            mountPath: /mnt/test.sh
            subPath: test.sh
      volumes:
        - name: test-file
          configMap:
            name: test-file
            defaultMode: 0777
      nodeSelector:
        team: test
      tolerations:
        - key: test
          operator: Equal
          value: executor
          effect: NoSchedule
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: test-file
  labels:
    app: nginx
data:
  test.sh: |
    #!/bin/bash
    ls -la /

Observe the pod status.

Resource Specs and Logs

Provisioner

apiVersion: karpenter.sh/v1alpha5
kind: Provisioner
metadata:
  name: default
spec:
  ttlSecondsAfterEmpty: 60
  ttlSecondsUntilExpired: 604800
  limits:
    resources:
      cpu: 16
  requirements:
    - key: karpenter.k8s.aws/instance-family
      operator: In
      values: ["r5", "r5a", "r5ad", "r5b", "r5d", "r5dn", "r5n"]
    - key: karpenter.k8s.aws/instance-size
      operator: In
      values: ["large", "xlarge", "2xlarge", "4xlarge", "8xlarge"]
    - key: "karpenter.sh/capacity-type"
      operator: In
      values: ["on-demand"]
    - key: "topology.kubernetes.io/zone"
      operator: In
      values: ["us-east-1a"]
  labels:
    team: test
  taints:
    - key: test
      value: executor
      effect: NoSchedule
  providerRef:
    name: default
---
apiVersion: karpenter.k8s.aws/v1alpha1
kind: AWSNodeTemplate
metadata:
  name: default
spec:
  subnetSelector:
    karpenter.sh/discovery: test-cluster
  securityGroupSelector:
    karpenter.sh/discovery: test-cluster

Karpenter controller log:

2023-04-13T14:15:41.586Z    DEBUG   controller.provisioner.cloudprovider    created launch template {"commit": "7131be2-dirty", "provisioner": "default", "launch-template-name": "Karpenter-test-cluster-16444089071061827633", "launch-template-id": "lt-03ec5baa5f03b26d8"}
2023-04-13T14:15:43.410Z    INFO    controller.provisioner.cloudprovider    launched new instance   {"commit": "7131be2-dirty", "provisioner": "default", "id": "i-0d414cdbdc1f4d882", "hostname": "ip-10-158-46-62.ec2.internal", "instance-type": "r5.large", "zone": "us-east-1a", "capacity-type": "on-demand"}
2023-04-13T14:23:45.469Z    DEBUG   controller.aws  deleted launch template {"commit": "7131be2-dirty"}

Kubelet log:

server.go:1246] "Started kubelet"
server.go:150] "Starting to listen" address="0.0.0.0" port=10250
server.go:410] "Adding debug handlers to kubelet server"
cri_stats_provider.go:455] "Failed to get the info of the filesystem with mountpoint" err="unable to find data in memory cache" mountpoint="/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs"
kubelet.go:1354] "Image garbage collection failed once. Stats initialization may not have completed yet" err="invalid capacity 0 on image filesystem"
certificate_manager.go:270] kubernetes.io/kubelet-serving: Certificate rotation is enabled
fs_resource_analyzer.go:67] "Starting FS ResourceAnalyzer"
volume_manager.go:292] "The desired_state_of_world populator starts"
volume_manager.go:294] "Starting Kubelet Volume Manager"
desired_state_of_world_populator.go:151] "Desired state populator starts to run"
kubelet.go:2397] "Container runtime network not ready" networkReady="NetworkReady=false reason:NetworkPluginNotReady message:Network plugin eturns error: cni plugin not initialized"
factory.go:145] Registering containerd factory
factory.go:55] Registering systemd factory
factory.go:103] Registering Raw factory
manager.go:1203] Started watching for new ooms in manager
manager.go:304] Starting recovery of all containers
manager.go:309] Recovery completed
kubelet_node_status.go:352] "Setting node annotation to enable volume controller attach/detach"
kubelet_node_status.go:400] "Adding label from cloud provider" labelKey="beta.kubernetes.io/instance-type" labelValue="t3.micro"
kubelet_node_status.go:402] "Adding node label from cloud provider" labelKey="node.kubernetes.io/instance-type" labelValue="t3.micro"
kubelet_node_status.go:413] "Adding node label from cloud provider" labelKey="failure-domain.beta.kubernetes.io/zone" labelValue="us-east-1a"
kubelet_node_status.go:415] "Adding node label from cloud provider" labelKey="topology.kubernetes.io/zone" labelValue="us-east-1a"
kubelet_node_status.go:419] "Adding node label from cloud provider" labelKey="failure-domain.beta.kubernetes.io/region" labelValue="us-east-1"
kubelet_node_status.go:421] "Adding node label from cloud provider" labelKey="topology.kubernetes.io/region" labelValue="us-east-1"
kubelet_node_status.go:563] "Recording event message for node" node="ip-10-158-46-62.ec2.internal" event="NodeHasSufficientMemory"
kubelet_node_status.go:563] "Recording event message for node" node="ip-10-158-46-62.ec2.internal" event="NodeHasNoDiskPressure"
kubelet_node_status.go:563] "Recording event message for node" node="ip-10-158-46-62.ec2.internal" event="NodeHasSufficientPID"
cpu_manager.go:213] "Starting CPU manager" policy="none"
cpu_manager.go:214] "Reconciling" reconcilePeriod="10s"
state_mem.go:36] "Initialized new in-memory state store"
policy_none.go:49] "None policy: Start"
memory_manager.go:168] "Starting memorymanager" policy="None"
state_mem.go:35] "Initializing new in-memory state store"
manager.go:247] "Starting Device Plugin manager"
manager.go:611] "Failed to read data from checkpoint" checkpoint="kubelet_internal_checkpoint" err="checkpoint is not found"
manager.go:289] "Serving device plugin registration server on socket" path="/var/lib/kubelet/device-plugins/kubelet.sock"
plugin_watcher.go:52] "Plugin Watcher Start" path="/var/lib/kubelet/plugins_registry"
eviction_manager.go:254] "Eviction manager: failed to get summary stats" err="failed to get node info: node \"ip-10-158-46-62.ec2.internal\" not found"
plugin_manager.go:112] "The desired_state_of_world populator (plugin watcher) starts"
plugin_manager.go:114] "Starting Kubelet Plugin Manager"
kubelet_node_status.go:352] "Setting node annotation to enable volume controller attach/detach"
kubelet_node_status.go:400] "Adding label from cloud provider" labelKey="beta.kubernetes.io/instance-type" labelValue="t3.micro"
kubelet_node_status.go:402] "Adding node label from cloud provider" labelKey="node.kubernetes.io/instance-type" labelValue="t3.micro"
kubelet_node_status.go:413] "Adding node label from cloud provider" labelKey="failure-domain.beta.kubernetes.io/zone" labelValue="us-east-1a"
kubelet_node_status.go:415] "Adding node label from cloud provider" labelKey="topology.kubernetes.io/zone" labelValue="us-east-1a"
kubelet_node_status.go:419] "Adding node label from cloud provider" labelKey="failure-domain.beta.kubernetes.io/region" labelValue="us-east-1"
kubelet_node_status.go:421] "Adding node label from cloud provider" labelKey="topology.kubernetes.io/region" labelValue="us-east-1"
kubelet_node_status.go:563] "Recording event message for node" node="ip-10-158-46-62.ec2.internal" event="NodeHasSufficientMemory"
kubelet_node_status.go:563] "Recording event message for node" node="ip-10-158-46-62.ec2.internal" event="NodeHasNoDiskPressure"
kubelet_node_status.go:563] "Recording event message for node" node="ip-10-158-46-62.ec2.internal" event="NodeHasSufficientPID"
kubelet_node_status.go:70] "Attempting to register node" node="ip-10-158-46-62.ec2.internal"
kubelet.go:2472] "Error getting node" err="node \"ip-10-158-46-62.ec2.internal\" not found"
kubelet_network_linux.go:57] "Initialized protocol iptables rules." protocol=IPv4
kubelet_network_linux.go:57] "Initialized protocol iptables rules." protocol=IPv6
status_manager.go:161] "Starting to sync pod status with apiserver"
kubelet.go:2034] "Starting kubelet main sync loop"
kubelet.go:2058] "Skipping pod synchronization" err="PLEG is not healthy: pleg has yet to be successful"
kubelet.go:2472] "Error getting node" err="node \"ip-10-158-46-62.ec2.internal\" not found"
kubelet.go:2472] "Error getting node" err="node \"ip-10-158-46-62.ec2.internal\" not found"
kubelet.go:2472] "Error getting node" err="node \"ip-10-158-46-62.ec2.internal\" not found"
kubelet.go:2472] "Error getting node" err="node \"ip-10-158-46-62.ec2.internal\" not found"
kubelet_node_status.go:108] "Node was previously registered" node="ip-10-158-46-62.ec2.internal"
kubelet_node_status.go:274] "Controller attach-detach setting changed to true; updating existing Node"
kubelet_node_status.go:73] "Successfully registered node" node="ip-10-158-46-62.ec2.internal"
kubelet_node_status.go:563] "Recording event message for node" node="ip-10-158-46-62.ec2.internal" event="NodeHasSufficientMemory"
kubelet_node_status.go:563] "Recording event message for node" node="ip-10-158-46-62.ec2.internal" event="NodeHasNoDiskPressure"
kubelet_node_status.go:563] "Recording event message for node" node="ip-10-158-46-62.ec2.internal" event="NodeHasSufficientPID"
apiserver.go:52] "Watching apiserver"
kubelet.go:2120] "SyncLoop ADD" source="api" pods=[kube-system/aws-node-nrswh kube-system/kube-proxy-pxqnh default/nginx-6688cfb6dd-cjz9q]
topology_manager.go:200] "Topology Admit Handler"
topology_manager.go:200] "Topology Admit Handler"
topology_manager.go:200] "Topology Admit Handler"
pod_workers.go:965] "Error syncing pod, skipping" err="network is not ready: container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized" pod="default/nginx-6688cfb6dd-cjz9q" podUID=d533dfae-860f-4d90-ae91-42a1813bb153
certificate_manager.go:270] kubernetes.io/kubelet-serving: Rotating certificates
reconciler.go:238] "operationExecutor.VerifyControllerAttachedVolume started for volume \"aws-iam-token\" (UniqueName: \"kubernetes.io/projected/3a75447e-fc85-4512-8c2d-660981250ad8-aws-iam-token\") pod \"aws-node-nrswh\" (UID: \"3a75447e-fc85-4512-8c2d-660981250ad8\") " pod="kube-system/aws-node-nrswh"
reconciler.go:238] "operationExecutor.VerifyControllerAttachedVolume started for volume \"cni-bin-dir\" (UniqueName: \"kubernetes.io/host-path/3a75447e-fc85-4512-8c2d-660981250ad8-cni-bin-dir\") pod \"aws-node-nrswh\" (UID: \"3a75447e-fc85-4512-8c2d-660981250ad8\") " pod="kube-system/aws-node-nrswh"
reconciler.go:238] "operationExecutor.VerifyControllerAttachedVolume started for volume \"varlog\" (UniqueName: \"kubernetes.io/host-path/b1edf288-91df-4d0d-891e-668b39a96480-varlog\") pod \"kube-proxy-pxqnh\" (UID: \"b1edf288-91df-4d0d-891e-668b39a96480\") " pod="kube-system/kube-proxy-pxqnh"
reconciler.go:238] "operationExecutor.VerifyControllerAttachedVolume started for volume \"kubeconfig\" (UniqueName: \"kubernetes.io/configmap/b1edf288-91df-4d0d-891e-668b39a96480-kubeconfig\") pod \"kube-proxy-pxqnh\" (UID: \"b1edf288-91df-4d0d-891e-668b39a96480\") " pod="kube-system/kube-proxy-pxqnh"
reconciler.go:238] "operationExecutor.VerifyControllerAttachedVolume started for volume \"kube-api-access-5m97q\" (UniqueName: \"kubernetes.io/projected/3a75447e-fc85-4512-8c2d-660981250ad8-kube-api-access-5m97q\") pod \"aws-node-nrswh\" (UID: \"3a75447e-fc85-4512-8c2d-660981250ad8\") " pod="kube-system/aws-node-nrswh"
reconciler.go:238] "operationExecutor.VerifyControllerAttachedVolume started for volume \"xtables-lock\" (UniqueName: \"kubernetes.io/host-path/b1edf288-91df-4d0d-891e-668b39a96480-xtables-lock\") pod \"kube-proxy-pxqnh\" (UID: \"b1edf288-91df-4d0d-891e-668b39a96480\") " pod="kube-system/kube-proxy-pxqnh"
reconciler.go:238] "operationExecutor.VerifyControllerAttachedVolume started for volume \"lib-modules\" (UniqueName: \"kubernetes.io/host-path/b1edf288-91df-4d0d-891e-668b39a96480-lib-modules\") pod \"kube-proxy-pxqnh\" (UID: \"b1edf288-91df-4d0d-891e-668b39a96480\") " pod="kube-system/kube-proxy-pxqnh"
reconciler.go:238] "operationExecutor.VerifyControllerAttachedVolume started for volume \"test-file\" (UniqueName: \"kubernetes.io/configmap/d533dfae-860f-4d90-ae91-42a1813bb153-test-file\") pod \"nginx-6688cfb6dd-cjz9q\" (UID: \"d533dfae-860f-4d90-ae91-42a1813bb153\") " pod="default/nginx-6688cfb6dd-cjz9q"
reconciler.go:238] "operationExecutor.VerifyControllerAttachedVolume started for volume \"run-dir\" (UniqueName: \"kubernetes.io/host-path/3a75447e-fc85-4512-8c2d-660981250ad8-run-dir\") pod \"aws-node-nrswh\" (UID: \"3a75447e-fc85-4512-8c2d-660981250ad8\") " pod="kube-system/aws-node-nrswh"
reconciler.go:238] "operationExecutor.VerifyControllerAttachedVolume started for volume \"xtables-lock\" (UniqueName: \"kubernetes.io/host-path/3a75447e-fc85-4512-8c2d-660981250ad8-xtables-lock\") pod \"aws-node-nrswh\" (UID: \"3a75447e-fc85-4512-8c2d-660981250ad8\") " pod="kube-system/aws-node-nrswh"
reconciler.go:238] "operationExecutor.VerifyControllerAttachedVolume started for volume \"kube-api-access-mwxcw\" (UniqueName: \"kubernetes.io/projected/b1edf288-91df-4d0d-891e-668b39a96480-kube-api-access-mwxcw\") pod \"kube-proxy-pxqnh\" (UID: \"b1edf288-91df-4d0d-891e-668b39a96480\") " pod="kube-system/kube-proxy-pxqnh"
reconciler.go:238] "operationExecutor.VerifyControllerAttachedVolume started for volume \"kube-api-access-l5f52\" (UniqueName: \"kubernetes.io/projected/d533dfae-860f-4d90-ae91-42a1813bb153-kube-api-access-l5f52\") pod \"nginx-6688cfb6dd-cjz9q\" (UID: \"d533dfae-860f-4d90-ae91-42a1813bb153\") " pod="default/nginx-6688cfb6dd-cjz9q"
reconciler.go:238] "operationExecutor.VerifyControllerAttachedVolume started for volume \"cni-net-dir\" (UniqueName: \"kubernetes.io/host-path/3a75447e-fc85-4512-8c2d-660981250ad8-cni-net-dir\") pod \"aws-node-nrswh\" (UID: \"3a75447e-fc85-4512-8c2d-660981250ad8\") " pod="kube-system/aws-node-nrswh"
reconciler.go:238] "operationExecutor.VerifyControllerAttachedVolume started for volume \"log-dir\" (UniqueName: \"kubernetes.io/host-path/3a75447e-fc85-4512-8c2d-660981250ad8-log-dir\") pod \"aws-node-nrswh\" (UID: \"3a75447e-fc85-4512-8c2d-660981250ad8\") " pod="kube-system/aws-node-nrswh"
reconciler.go:238] "operationExecutor.VerifyControllerAttachedVolume started for volume \"config\" (UniqueName: \"kubernetes.io/configmap/b1edf288-91df-4d0d-891e-668b39a96480-config\") pod \"kube-proxy-pxqnh\" (UID: \"b1edf288-91df-4d0d-891e-668b39a96480\") " pod="kube-system/kube-proxy-pxqnh"
reconciler.go:167] "Reconciler: start to sync state"
csr.go:262] certificate signing request csr-f7b27 is approved, waiting to be issued
reconciler.go:293] "operationExecutor.MountVolume started for volume \"kubeconfig\" (UniqueName: \"kubernetes.io/configmap/b1edf288-91df-4d0d-891e-668b39a96480-kubeconfig\") pod \"kube-proxy-pxqnh\" (UID: \"b1edf288-91df-4d0d-891e-668b39a96480\") " pod="kube-system/kube-proxy-pxqnh"
reconciler.go:293] "operationExecutor.MountVolume started for volume \"aws-iam-token\" (UniqueName: \"kubernetes.io/projected/3a75447e-fc85-4512-8c2d-660981250ad8-aws-iam-token\") pod \"aws-node-nrswh\" (UID: \"3a75447e-fc85-4512-8c2d-660981250ad8\") " pod="kube-system/aws-node-nrswh"
reconciler.go:293] "operationExecutor.MountVolume started for volume \"cni-bin-dir\" (UniqueName: \"kubernetes.io/host-path/3a75447e-fc85-4512-8c2d-660981250ad8-cni-bin-dir\") pod \"aws-node-nrswh\" (UID: \"3a75447e-fc85-4512-8c2d-660981250ad8\") " pod="kube-system/aws-node-nrswh"
reconciler.go:293] "operationExecutor.MountVolume started for volume \"varlog\" (UniqueName: \"kubernetes.io/host-path/b1edf288-91df-4d0d-891e-668b39a96480-varlog\") pod \"kube-proxy-pxqnh\" (UID: \"b1edf288-91df-4d0d-891e-668b39a96480\") " pod="kube-system/kube-proxy-pxqnh"
reconciler.go:293] "operationExecutor.MountVolume started for volume \"kube-api-access-5m97q\" (UniqueName: \"kubernetes.io/projected/3a75447e-fc85-4512-8c2d-660981250ad8-kube-api-access-5m97q\") pod \"aws-node-nrswh\" (UID: \"3a75447e-fc85-4512-8c2d-660981250ad8\") " pod="kube-system/aws-node-nswh"
reconciler.go:293] "operationExecutor.MountVolume started for volume \"xtables-lock\" (UniqueName: \"kubernetes.io/host-path/b1edf288-91df-4d0d-891e-668b39a96480-xtables-lock\") pod \"kube-proxy-pxqnh\" (UID: \"b1edf288288-91df-4d0d-891e-668b39a96480\") " pod="kube-system/kube-proxy-pxqnh"
reconciler.go:293] "operationExecutor.MountVolume started for volume \"lib-modules\" (UniqueName: \"kubernetes.io/host-path/b1edf288-91df-4d0d-891e-668b39a96480-lib-modules\") pod \"kube-proxy-pxqnh\" (UID: \"b1edf288-91df-4d0d-891e-668b39a96480\") " pod="kube-system/kube-proxy-pxqnh"
reconciler.go:293] "operationExecutor.MountVolume started for volume \"kube-api-access-mwxcw\" (UniqueName: \"kubernetes.io/projected/b1edf288-91df-4d0d-891e-668b39a96480-kube-api-access-mwxcw\") pod \"kube-proxy-pxqnh\" (UID: \"b1edf288-91df-4d0d-891e-668b39a96480\") " pod="kube-system/kube-proxy-pxqnh"
reconciler.go:293] "operationExecutor.MountVolume started for volume \"test-file\" (UniqueName: \"kubernetes.io/configmap/d533dfae-860f-4d90-ae91-42a1813bb153-test-file\") pod \"nginx-6688cfb6dd-cjz9q\" (UID: \"d533dfae-860f-4d90-ae91-42a1813bb153\") " pod="default/nginx-6688cfb6dd-cjz9q"
reconciler.go:293] "operationExecutor.MountVolume started for volume \"run-dir\" (UniqueName: \"kubernetes.io/host-path/3a75447e-fc85-4512-8c2d-660981250ad8-run-dir\") pod \"aws-node-nrswh\" (UID: \"3a75447e-fc85-4512-8c2d-660981250ad8\") " pod="kube-system/aws-node-nrswh"
reconciler.go:293] "operationExecutor.MountVolume started for volume \"xtables-lock\" (UniqueName: \"kubernetes.io/host-path/3a75447e-fc85-4512-8c2d-660981250ad8-xtables-lock\") pod \"aws-node-nrswh\" (UID: \"3a75447e-fc85-4512-8c2d-660981250ad8\") " pod="kube-system/aws-node-nrswh"
reconciler.go:293] "operationExecutor.MountVolume started for volume \"config\" (UniqueName: \"kubernetes.io/configmap/b1edf288-91df-4d0d-891e-668b39a96480-config\") pod \"kube-proxy-pxqnh\" (UID: \"b1edf288-91df-4d0d-891e-668b39a96480\") " pod="kube-system/kube-proxy-pxqnh"
reconciler.go:293] "operationExecutor.MountVolume started for volume \"kube-api-access-l5f52\" (UniqueName: \"kubernetes.io/projected/d533dfae-860f-4d90-ae91-42a1813bb153-kube-api-access-l5f52\") pod \"nginx-6688cfb6dd-cjz9q\" (UID: \"d533dfae-860f-4d90-ae91-42a1813bb153\") " pod="default/nginx-6688cfb6dd-cjz9q"
reconciler.go:293] "operationExecutor.MountVolume started for volume \"cni-net-dir\" (UniqueName: \"kubernetes.io/host-path/3a75447e-fc85-4512-8c2d-660981250ad8-cni-net-dir\") pod \"aws-node-nrswh\" (UID: \"3a75447e-fc85-4512-8c2d-660981250ad8\") " pod="kube-system/aws-node-nrswh"
reconciler.go:293] "operationExecutor.MountVolume started for volume \"log-dir\" (UniqueName: \"kubernetes.io/host-path/3a75447e-fc85-4512-8c2d-660981250ad8-log-dir\") pod \"aws-node-nrswh\" (UID: \"3a75447e-fc85-4512-8c2d-660981250ad8\") " pod="kube-system/aws-node-nrswh"
operation_generator.go:756] "MountVolume.SetUp succeeded for volume \"log-dir\" (UniqueName: \"kubernetes.io/host-path/3a75447e-fc85-4512-8c2d-660981250ad8-log-dir\") pod \"aws-node-nrswh\" (UID: \"3a75447e-fc85-4512-8c2d-660981250ad8\") " pod="kube-system/aws-node-nrswh"
operation_generator.go:756] "MountVolume.SetUp succeeded for volume \"cni-bin-dir\" (UniqueName: \"kubernetes.io/host-path/3a75447e-fc85-4512-8c2d-660981250ad8-cni-bin-dir\") pod \"aws-node-nrswh\" (UID: \"3a75447e-fc85-4512-8c2d-660981250ad8\") " pod="kube-system/aws-node-nrswh"
operation_generator.go:756] "MountVolume.SetUp succeeded for volume \"kubeconfig\" (UniqueName: \"kubernetes.io/configmap/b1edf288-91df-4d0d-891e-668b39a96480-kubeconfig\") pod \"kube-proxy-pxqnh\" (UID: \"b1edf288-91df-4d0d-891e-668b39a96480\") " pod="kube-system/kube-proxy-pxqnh"
operation_generator.go:756] "MountVolume.SetUp succeeded for volume \"varlog\" (UniqueName: \"kubernetes.io/host-path/b1edf288-91df-4d0d-891e-668b39a96480-varlog\") pod \"kube-proxy-pxqnh\" (UID: \"b1edf288-91df-4d0d-891e-668b39a96480\") " pod="kube-system/kube-proxy-pxqnh"
operation_generator.go:756] "MountVolume.SetUp succeeded for volume \"xtables-lock\" (UniqueName: \"kubernetes.io/host-path/b1edf288-91df-4d0d-891e-668b39a96480-xtables-lock\") pod \"kube-proxy-pxqnh\" (UID: \"b1edf288-91df-4d0d-891e-668b39a96480\") " pod="kube-system/kube-proxy-pxqnh"
operation_generator.go:756] "MountVolume.SetUp succeeded for volume \"run-dir\" (UniqueName: \"kubernetes.io/host-path/3a75447e-fc85-4512-8c2d-660981250ad8-run-dir\") pod \"aws-node-nrswh\" (UID: \"3a75447e-fc85-4512-8c2d-660981250ad8\") " pod="kube-system/aws-node-nrswh"
operation_generator.go:756] "MountVolume.SetUp succeeded for volume \"lib-modules\" (UniqueName: \"kubernetes.io/host-path/b1edf288-91df-4d0d-891e-668b39a96480-lib-modules\") pod \"kube-proxy-pxqnh\" (UID: \"b1edf288-91df-4d0d-891e-668b39a96480\") " pod="kube-system/kube-proxy-pxqnh"
operation_generator.go:756] "MountVolume.SetUp succeeded for volume \"xtables-lock\" (UniqueName: \"kubernetes.io/host-path/3a75447e-fc85-4512-8c2d-660981250ad8-xtables-lock\") pod \"aws-node-nrswh\" (UID: \"3a75447e-fc85-4512-8c2d-660981250ad8\") " pod="kube-system/aws-node-nrswh"
operation_generator.go:756] "MountVolume.SetUp succeeded for volume \"config\" (UniqueName: \"kubernetes.io/configmap/b1edf288-91df-4d0d-891e-668b39a96480-config\") pod \"kube-proxy-pxqnh\" (UID: \"b1edf288-91df-4d0d-891e-668b39a96480\") " pod="kube-system/kube-proxy-pxqnh"
operation_generator.go:756] "MountVolume.SetUp succeeded for volume \"cni-net-dir\" (UniqueName: \"kubernetes.io/host-path/3a75447e-fc85-4512-8c2d-660981250ad8-cni-net-dir\") pod \"aws-node-nrswh\" (UID: \"3a75447e-fc85-4512-8c2d-660981250ad8\") " pod="kube-system/aws-node-nrswh"
configmap.go:200] Couldn't get configMap default/test-file: object "default"/"test-file" not registered
nestedpendingoperations.go:348] Operation for "{volumeName:kubernetes.io/configmap/d533dfae-860f-4d90-ae91-42a1813bb153-test-file podName:d533dfae-860f-4d90-ae91-42a1813bb153 nodeName:}" failed. No retries permitted until 2023-04-13 14:16:16.988808902 +0000 UTC m=+7.227586204 (durationBeforeRetry 500ms). Error: MountVolume.SetUp failed for volume "test-file" (UniqueName: "kubernetes.io/configmap/d533dfae-860f-4d90-ae91-42a1813bb153-test-file") pod "nginx-6688cfb6dd-cjz9q" (UID: "d533dfae-860f-4d90-ae91-42a1813bb153") : object "default"/"test-file" not registered
operation_generator.go:756] "MountVolume.SetUp succeeded for volume \"kube-api-access-mwxcw\" (UniqueName: \"kubernetes.io/projected/b1edf288--91df-4d0d-891e-668b39a96480-kube-api-access-mwxcw\") pod \"kube-proxy-pxqnh\" (UID: \"b1edf288-91df-4d0d-891e-668b39a96480\") " pod="kube-system/kube-proxy-pxqnh"
projected.go:293] Couldn't get configMap default/kube-root-ca.crt: object "default"/"kube-root-ca.crt" not registered
projected.go:199] Error preparing data for projected volume kube-api-access-l5f52 for pod default/nginx-6688cfb6dd-cjz9q: object "default"/"kube-root-ca.crt" not registered
operation_generator.go:756] "MountVolume.SetUp succeeded for volume \"aws-iam-token\" (UniqueName: \"kubernetes.io/projected/3a75447e-fc85-4512-8c2d-660981250ad8-aws-iam-token\") pod \"aws-node-nrswh\" (UID: \"3a75447e-fc85-4512-8c2d-660981250ad8\") " pod="kube-system/aws-node-nrswh"
nestedpendingoperations.go:348] Operation for "{volumeName:kubernetes.io/projected/d533dfae-860f-4d90-ae91-42a1813bb153-kube-api-access-l5f52 podName:d533dfae-860f-4d90-ae91-42a1813bb153 nodeName:}" failed. No retries permitted until 2023-04-13 14:16:17.012951706 +0000 UTC m=+7.251729003 (durationBeforeRetry 500ms). Error: MountVolume.SetUp failed for volume "kube-api-access-l5f52" (UniqueName: "kubernetes.io/projected/d533dfae-860f-4d90-ae91-42a1813bb153-kube-api-access-l5f52") pod "nginx-6688cfb6dd-cjz9q" (UID: "d533dfae-860f-4d90-ae91-42a1813bb153") : object "default"/"kube-root-ca.crt" not registered
operation_generator.go:756] "MountVolume.SetUp succeeded for volume \"kube-api-access-5m97q\" (UniqueName: \"kubernetes.io/projected/3a75447e-fc85-4512-8c2d-660981250ad8-kube-api-access-5m97q\") pod \"aws-node-nrswh\" (UID: \"3a75447e-fc85-4512-8c2d-660981250ad8\") " pod="kube-system/aws-node-nrswh"
kuberuntime_manager.go:487] "No sandbox for pod can be found. Need to start a new one" pod="kube-system/kube-proxy-pxqnh"
kuberuntime_manager.go:487] "No sandbox for pod can be found. Need to start a new one" pod="kube-system/aws-node-nrswh"
reconciler.go:293] "operationExecutor.MountVolume started for volume \"test-file\" (UniqueName: \"kubernetes.io/configmap/d533dfae-860f-4d90-ae91-42a1813bb153-test-file\") pod \"nginx-6688cfb6dd-cjz9q\" (UID: \"d533dfae-860f-4d90-ae91-42a1813bb153\") " pod="default/nginx-6688cfb6dd-cjz9q"
reconciler.go:293] "operationExecutor.MountVolume started for volume \"kube-api-access-l5f52\" (UniqueName: \"kubernetes.io/projected/d533dfae-860f-4d90-ae91-42a1813bb153-kube-api-access-l5f52\") pod \"nginx-6688cfb6dd-cjz9q\" (UID: \"d533dfae-860f-4d90-ae91-42a1813bb153\") " pod="default/nginx-6688cfb6dd-cjz9q"

Community Note

tzneal commented 1 year ago

Is this a transient condition? e.g. first mount fails and then when the node is ready, it succeeds?

hvikharev commented 1 year ago

yes. it repeats 5 or 6 times and when instance is ready, it succeeds. But with CAS I don't have this issue. kubelet has the same flags in both cases. CAS node mounts volumes successful from first attempt and Karpenter node isn't.

tzneal commented 1 year ago

I suspect this is caused by Karpenter creating the node object which CAS does not do. This should be resolved soon when we stop doing that.

CC @jonathan-innis

rr-krupesh-savaliya commented 1 year ago

We are facing same issue. Sometime it will not resolve after so many retries. can we fix this asap?

h1-himanshu commented 1 year ago

Hi @jonathan-innis .. Is there a release label with merged fix yet?

jonathan-innis commented 1 year ago

@h1-himanshu Yes, this was released under an RC v0.28.0-rc.1 last week.

jonathan-innis commented 1 year ago

@hvikharev @h1-himanshu @rr-krupesh-savaliya Have you tried out the latest release candidate v0.28.0-rc.1 and see if this resolves the issue you were seeing on older versions of Karpenter?

rr-krupesh-savaliya commented 1 year ago

@jonathan-innis Yes, we tested v0.28.0-rc.1 release on multiple clusters, and as a result, we no longer encounter any issues with mounting volumes.

hvikharev commented 1 year ago

@jonathan-innis I tested v0.28.0-rc.1 too and haven't seen any issue with mount volumes. Thank you!

npandeya commented 1 year ago

Hi I am using 0.28.1 version and getting this error: Warning FailedMount 15m (x7 over 16m) kubelet MountVolume.SetUp failed for volume "client-secret" : kubernetes.io/csi: mounter.SetUpAt failed to get CSI client: driver name secrets-store.csi.k8s.io not found in the list of registered CSI drivers Warning FailedMount 12m kubelet Unable to attach or mount volumes: unmounted volumes=[pvc], unattached volumes=[client-secret pvc kube-api-access-xvpkn aws-iam-token config secret]: timed out waiting for the condition Warning FailedMount 7m55s kubelet Unable to attach or mount volumes: unmounted volumes=[pvc], unattached volumes=[secret client-secret pvc kube-api-access-xvpkn aws-iam-token config]: timed out waiting for the condition Warning FailedMount 5m52s (x2 over 14m) kubelet Unable to attach or mount volumes: unmounted volumes=[pvc], unattached volumes=[aws-iam-token config secret client-secret pvc kube-api-access-xvpkn]: timed out waiting for the condition Warning FailedMount 106s (x3 over 9m58s) kubelet Unable to attach or mount volumes: unmounted volumes=[pvc], unattached volumes=[config secret client-secret pvc kube-api-access-xvpkn aws-iam-token]: timed out waiting for the condition

Can anyone please help? I have verified the the secret csi driver is running on the karpenter node.