Closed kamialie closed 1 year ago
Can you provide a sample pod, PVC and PV spec that reproduces this issue?
I'm currently working with StatefulSet, please, see example below, but I assume it is the same for deployment as well, or any other controller that can run multiple Pods on the same node.
PV is provisioned automatically by the project I referenced above.
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: hello-app
spec:
serviceName: hello-app
replicas: 2
selector:
matchLabels:
name: hello-app
template:
metadata:
labels:
name: hello-app
spec:
nodeSelector:
role: test
containers:
- name: hello-app
image: <>
volumeMounts:
- name: data
mountPath: /etc/data
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: nvme-ssd
resources:
requests:
storage: 1Gi
---
apiVersion: v1
kind: Service
metadata:
name: hello-app
labels:
name: hello-storage
spec:
clusterIP: None
selector:
name: hello-app
Sorry, can you provide your storage class as well and the output of:
kubectl get csinode node-name -o yaml
for a node where this is running.
StorageClass:
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
annotations:
meta.helm.sh/release-name: local-storage-provisioner
meta.helm.sh/release-namespace: storage
creationTimestamp: "2022-08-01T10:50:00Z"
labels:
app.kubernetes.io/instance: local-storage-provisioner
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: provisioner
helm.sh/chart: provisioner-2.6.0-alpha.1
name: nvme-ssd
resourceVersion: "25633126"
uid: 3cc791b3-d5df-4605-9665-1fd43ad278a5
provisioner: kubernetes.io/no-provisioner
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer
Node info:
apiVersion: v1
kind: Node
metadata:
annotations:
node.alpha.kubernetes.io/ttl: "0"
volumes.kubernetes.io/controller-managed-attach-detach: "true"
creationTimestamp: "2022-08-02T09:01:23Z"
finalizers:
- karpenter.sh/termination
labels:
beta.kubernetes.io/arch: amd64
beta.kubernetes.io/instance-type: g4dn.12xlarge
beta.kubernetes.io/os: linux
failure-domain.beta.kubernetes.io/region: eu-central-1
failure-domain.beta.kubernetes.io/zone: eu-central-1b
karpenter.k8s.aws/instance-cpu: "48"
karpenter.k8s.aws/instance-family: g4dn
karpenter.k8s.aws/instance-gpu-count: "4"
karpenter.k8s.aws/instance-gpu-manufacturer: nvidia
karpenter.k8s.aws/instance-gpu-memory: "16384"
karpenter.k8s.aws/instance-gpu-name: t4
karpenter.k8s.aws/instance-hypervisor: nitro
karpenter.k8s.aws/instance-memory: "196608"
karpenter.k8s.aws/instance-pods: "234"
karpenter.k8s.aws/instance-size: 12xlarge
karpenter.sh/capacity-type: on-demand
karpenter.sh/initialized: "true"
karpenter.sh/provisioner-name: test
kubernetes.io/arch: amd64
kubernetes.io/hostname: ip-10-100-153-214.eu-central-1.compute.internal
kubernetes.io/os: linux
node.kubernetes.io/instance-type: g4dn.12xlarge
nvme: "true"
role: test
topology.kubernetes.io/region: eu-central-1
topology.kubernetes.io/zone: eu-central-1b
name: ip-10-100-153-214.eu-central-1.compute.internal
ownerReferences:
- apiVersion: karpenter.sh/v1alpha5
blockOwnerDeletion: true
kind: Provisioner
name: test
uid: db1b6c68-1717-4912-8e70-31336f33aa2b
resourceVersion: "26239766"
uid: 9ea5fa4e-9e50-4b69-95d9-1bc3aadeec6a
spec:
providerID: aws:///eu-central-1b/i-05ed1ab0fa33a447a
status:
addresses:
- address: 10.100.153.214
type: InternalIP
- address: ip-10-100-153-214.eu-central-1.compute.internal
type: Hostname
- address: ip-10-100-153-214.eu-central-1.compute.internal
type: InternalDNS
allocatable:
attachable-volumes-aws-ebs: "39"
cpu: 47810m
ephemeral-storage: "103282620244"
hugepages-1Gi: "0"
hugepages-2Mi: "0"
memory: 192687064Ki
nvidia.com/gpu: "4"
pods: "234"
capacity:
attachable-volumes-aws-ebs: "39"
cpu: "48"
ephemeral-storage: 113233900Ki
hugepages-1Gi: "0"
hugepages-2Mi: "0"
memory: 195686360Ki
nvidia.com/gpu: "4"
pods: "234"
conditions:
- lastHeartbeatTime: "2022-08-02T12:28:38Z"
lastTransitionTime: "2022-08-02T09:02:50Z"
message: kubelet is posting ready status
reason: KubeletReady
status: "True"
type: Ready
- lastHeartbeatTime: "2022-08-02T12:28:38Z"
lastTransitionTime: "2022-08-02T09:02:30Z"
message: kubelet has sufficient memory available
reason: KubeletHasSufficientMemory
status: "False"
type: MemoryPressure
- lastHeartbeatTime: "2022-08-02T12:28:38Z"
lastTransitionTime: "2022-08-02T09:02:30Z"
message: kubelet has no disk pressure
reason: KubeletHasNoDiskPressure
status: "False"
type: DiskPressure
- lastHeartbeatTime: "2022-08-02T12:28:38Z"
lastTransitionTime: "2022-08-02T09:02:30Z"
message: kubelet has sufficient PID available
reason: KubeletHasSufficientPID
status: "False"
type: PIDPressure
daemonEndpoints:
kubeletEndpoint:
Port: 10250
images:
- names:
- nvcr.io/nvidia/k8s-device-plugin@sha256:4918fdb36600589793b6a4b96be874a673c407e85c2cf707277e532e2d8a2231
- nvcr.io/nvidia/k8s-device-plugin:v0.12.2
sizeBytes: 109488523
- names:
- 602401143452.dkr.ecr.eu-central-1.amazonaws.com/amazon-k8s-cni@sha256:3b6db8b6fb23424366ef91d7e9e818e42291316fa81c00c2c75dcafa614340c5
- 602401143452.dkr.ecr.eu-central-1.amazonaws.com/amazon-k8s-cni:v1.10.1-eksbuild.1
sizeBytes: 107971097
- names:
- 602401143452.dkr.ecr.eu-central-1.amazonaws.com/amazon-k8s-cni-init@sha256:6c70af7bf257712105a89a896b2afb86c86ace865d32eb73765bf29163a08c56
- 602401143452.dkr.ecr.eu-central-1.amazonaws.com/amazon-k8s-cni-init:v1.10.1-eksbuild.1
sizeBytes: 106951309
- names:
- docker.io/ethersphere/eks-local-disk-provisioner@sha256:bec6b3d15ea3501b5e8c03e9d2c39f2117753dfefa530fb70cfaa2a88ad1df19
- docker.io/ethersphere/eks-local-disk-provisioner:latest
sizeBytes: 98423131
- names:
- k8s.gcr.io/sig-storage/local-volume-provisioner@sha256:63859b69f9dfc0858e5d8746218e435c36e205c041fb6d8baf71ad132e24737f
- k8s.gcr.io/sig-storage/local-volume-provisioner:v2.4.0
sizeBytes: 40509761
- names:
- 602401143452.dkr.ecr.eu-central-1.amazonaws.com/eks/kube-proxy@sha256:c8abb4b8efc94090458f34e5f456791d9f7f57b5c99517b6b4e197305c1f10f6
- 602401143452.dkr.ecr.eu-central-1.amazonaws.com/eks/kube-proxy:v1.22.6-eksbuild.1
sizeBytes: 35948825
- names:
- quay.io/brancz/kube-rbac-proxy@sha256:6237b9f78f17fb0beafc99ff38602add6f51a0fdfa5395785f8d31a8f833e363
- quay.io/brancz/kube-rbac-proxy:v0.13.0
sizeBytes: 25405919
- names:
- quay.io/prometheus/node-exporter@sha256:f2269e73124dd0f60a7d19a2ce1264d33d08a985aed0ee6b0b89d0be470592cd
- quay.io/prometheus/node-exporter:v1.3.1
sizeBytes: 10347719
- names:
- 385808790715.dkr.ecr.eu-central-1.amazonaws.com/hello-app@sha256:88b205d7995332e10e836514fbfd59ecaf8976fc15060cd66e85cdcebe7fb356
- 385808790715.dkr.ecr.eu-central-1.amazonaws.com/hello-app:1.0
sizeBytes: 4892466
- names:
- 602401143452.dkr.ecr.eu-central-1.amazonaws.com/eks/pause:3.5
sizeBytes: 298689
nodeInfo:
architecture: amd64
bootID: ed8b943d-b2b6-4d6d-8950-69c755afb559
containerRuntimeVersion: containerd://1.4.13
kernelVersion: 5.4.204-113.362.amzn2.x86_64
kubeProxyVersion: v1.22.9-eks-810597c
kubeletVersion: v1.22.9-eks-810597c
machineID: ec20c717d81a8e759d5ba1a42cfa863c
operatingSystem: linux
osImage: Amazon Linux 2
systemUUID: ec20c717-d81a-8e75-9d5b-a1a42cfa863c
We use the csinode object to determine the volume limits per CSI driver. There is no CSI driver that I can tell for these local volumes, so there's nothing to tell us that the volume won't mount.
The storage class provisioner is a non-existent kubernetes.io/no-provisioner
which doesn't appear to be unique to these local volumes.
The only solution I'm seeing is to add a pod anti-affinity rule to your stateful set so you get no more than one per node:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: "name"
operator: In
values:
- hello-app
topologyKey: "kubernetes.io/hostname"
Yep, that's what I'm currently doing, but was looking for a possibility to run multiple pods on a single node. If that's the case, hostPath
seems an easier approach for now, as dynamic local storage provisioning is not coming any time soon to Kubernetes, even way later to EKS.
Labeled for closure due to inactivity in 10 days.
Version
Karpenter: v0.13.1
Kubernetes: v1.22.0
Expected Behavior
I have a StatefulSet with some PVC. PVC requests a PV of local type, which at the moment doesn't provide dynamic provisioning. I am, therefore, using a local-static-provisioner project to create PVs on a new node. To give more background some scripts mounts devices to a custom directory, which the latter project exposes as PVs in Kubernetes.
Ideally I would want Karpenter to react to an event when no PV is currently available on a node it expects a pending pod to be scheduled, and try to start a new node, but that is probably out of scope of Karpenter, so I'm curious of your opinion/advice here.
Actual Behavior
Since Karpenter provisioned a big enough instance (in terms of CPU/memory) for two pods, Karpenter logs indicate that a second pod should be scheduled on the existing node, while it doesn't due to no available PV (I configured static provisioner to create a single local PV per node, which was consumed by the first pod).
Steps to Reproduce the Problem
Statically provision a PV on a node, create a StatefulSet with 2 replicas and specify resources requirements that ensure 2 pods being able to run on the same node.
Resource Specs and Logs