dl1.24xlrage memory resource incompatibility

DanielJuravski commented 1 year ago

Version

Karpenter Version: v0.22.1

Kubernetes Version: v1.24

Expected Behavior

Allocate a workload with memory limit & request which are suitable for dl1.24xlarge memory (= 694632Mi) which is described here: https://karpenter.sh/v0.22.1/concepts/instance-types/#resources-148

resources:
  limits:
    habana.ai/gaudi: 8
    memory: 690Gi
  requests:
    habana.ai/gaudi: 8
    memory: 690Gi

In addition, why the memory appears in the instance type resources description (https://karpenter.sh/v0.22.1/concepts/instance-types/#resources-148) is 694632Mi, while the karpenter.k8s.aws/instance-memory is 786432Mi? How this calculation is made?

Actual Behavior

Karpenter failed to schedule an instance type with the above resources. incompatible with provisioner "gaudi", no instance type satisfied resources {"habana.ai/gaudi":"8","memory":"690Gi","pods":"1"} and requirements kubernetes.io/os In [linux], node.kubernetes.io/instance-type In [dl1.24xlarge], habana.ai/gaudi In [true], karpenter.sh/provisioner-name In [gaudi], karpenter.sh/capacity-type In [on-demand], karpenter.k8s.aws/instance-family In [dl1], kubernetes.io/arch In [amd64];

Steps to Reproduce the Problem

Create Provisioner, AWSNodeTemplate and workload

apiVersion: karpenter.sh/v1alpha5
kind: Provisioner
metadata:
  name: default
spec:
  requirements:
    - key: "karpenter.k8s.aws/instance-category"
      operator: In
      values: ["c", "m", "r", "p", "g", "dl"]
  providerRef:
    name: default
  ttlSecondsAfterEmpty: 10
---
apiVersion: karpenter.k8s.aws/v1alpha1
kind: AWSNodeTemplate
metadata:
  name: default
spec:
  subnetSelector:
    karpenter.sh/discovery: my-cluster
  securityGroupSelector:
    karpenter.sh/discovery: my-cluster

Apply a workload

apiVersion: apps/v1
kind: Deployment
metadata:
name: inflate
spec:
replicas: 4
selector:
matchLabels:
  app: inflate
template:
metadata:
  labels:
    app: inflate
spec:
  terminationGracePeriodSeconds: 0
  containers:
    - name: inflate
      image: public.ecr.aws/eks-distro/kubernetes/pause:3.2
  resources:
    limits:
      habana.ai/gaudi: 8
      memory: 690Gi
    requests:
      habana.ai/gaudi: 8
      memory: 690Gi

Resource Specs and Logs

incompatible with provisioner "gaudi", no instance type satisfied resources {"habana.ai/gaudi":"8","memory":"690Gi","pods":"1"} and requirements kubernetes.io/os In [linux], node.kubernetes.io/instance-type In [dl1.24xlarge], habana.ai/gaudi In [true], karpenter.sh/provisioner-name In [gaudi], karpenter.sh/capacity-type In [on-demand], karpenter.k8s.aws/instance-family In [dl1], kubernetes.io/arch In [amd64];

Community Note

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment

jonathan-innis commented 1 year ago

The requests that you are asking for are too big given what we assume the node allocatable will be.

694632Mi ~= 678.35Gi

We won't be able to schedule if you are requesting for 680Gi

DanielJuravski commented 1 year ago

@jonathan-innis that defiantly answers my first question, but what about the second question? Why the memory appears in the instance type resources description (https://karpenter.sh/v0.22.1/concepts/instance-types/#resources-148) is 694632Mi, while the karpenter.k8s.aws/instance-memory is 786432Mi? How this calculation is made?

jonathan-innis commented 1 year ago

The second number that you are mentioning, karpenter.k8s.aws/instance-memory is coming directly from the EC2 API, where we are adding this to the node so that you can specify advanced scheduling requirements where you can say something like, "I want instances that have less than 600Gi of memory."

the first value that you mention in the resources description is the allocatable memory, that we assume we will have to actually schedule workloads after we take away any overhead (kube-reserved, system-reserved, vm-overhead, etc.)

aws / karpenter-provider-aws