aws-samples / comfyui-on-eks

ComfyUI on AWS
MIT No Attribution
108 stars 17 forks source link

GPU-nodes scaling down to 0 - Not Working #6

Open amitpate opened 4 months ago

amitpate commented 4 months ago

Karpenter doesn't appear to be scaling down to zero nodes when there's no workload.

TemryL commented 4 months ago

Same here.

Shellmode commented 4 months ago

Please check if the pod is stuck in the "Terminating" status.

TemryL commented 4 months ago

The pod's status stay on "Running"

Shellmode commented 4 months ago

You can check the karpenter log to see if it works well, or open an issue at Karpenter

blakegreendev commented 3 months ago

I'm also seeing this behavior. I'm still new to Karpenter but with the Comfyui deployment, how does it know to scale down in the first place? Is that already configured in the disruption policy?

Shellmode commented 3 months ago

Could you kindly provide some Kubernetes and Karpenter logs, I'm not able to reproduce the issue.

edwinwu2014 commented 3 months ago

GPU确实不会缩减,也不会自动扩容。 并发调用了500 次文生图,不会扩容新的机器

Shellmode commented 3 months ago

I think I misunderstood what you meant. I thought when you said there was no workload, you were referring to no pods running, not that there were no requests to ComfyUI. Karpenter is responsible for ensuring that your cluster has enough nodes to schedule your pods without wasting resources. If your pod have no node to be scheduled, Karpenter will spin up a cheapest instance (which meets your requirement in definition yaml) to schedule pod on it, and if your node has no pod running on it (no workload), the node will be terminated to save costs.

Shellmode commented 3 months ago

If you wanna auto scale GPU nodes, it's recommended to monitor ComfyUI pending queue with /prompt ComfyUI API interface, and develop your own auto scaling policy (like number of pending requests exceeds the threshold), and scale in/out by just one command: kubectl scale deploy/comfyui --replicas=0/1/2/3/4/...

One thing to note, it may take some time (from a few minutes to ten minutes) from spinning up an instance to the pod get running, because we trade startup time for model loading and switching performance.