awslabs / data-on-eks

DoEKS is a tool to build, deploy and scale Data & ML Platforms on Amazon EKS
https://awslabs.github.io/data-on-eks/
Apache License 2.0
556 stars 185 forks source link

Kueue with Ray #444

Open askulkarni2 opened 4 months ago

askulkarni2 commented 4 months ago

Create a detailed blueprint to set up Kueue with Ray to demonstrate large scale batch processing (training, batch predictions, etc.). Customers are looking into Kueue as alternative to Yunikorn/Volcano as it is a CNCF project and provides k8s native option for advanced scheduling.

vara-bonthu commented 4 months ago

We can use existing JARK and trainium-inferentia blueprint to enable this scheduler and showcase the examples rather than creating a new blueprint for this.

jihed commented 3 months ago

I can look at this.