dandi / dandi-hub

Infrastructure and code for the dandihub
https://hub.dandiarchive.org
Other
11 stars 23 forks source link

Karpenter picks large instances #196

Open asmacdo opened 2 months ago

asmacdo commented 2 months ago

In our default NodePool (everything but on-demand and GPU) Karpenter has the following matrix of options:

instanceSizes: ["xlarge", "2xlarge", "4xlarge", "8xlarge", "16xlarge", "24xlarge"]                                          
instanceFamilies: ["c5", "m5", "r5"]  

However, there are cases where a very large/expensive node gets deployed to accommodate potentially many user-pods, but are kept live for just 1 user.

For example, a user has been running a "Base" profile instance, but it has been running on an r5.24xlarge. I'm not sure what conditions led to this, but the user has been alone on this expensive node for some time.

IMO we should force Karpenter to pick smaller nodes.