Closed k82cn closed 2 years ago
@k82cn This is great! Looking forward to the first draft of the charter. In the past we have used Google docs for drafts but feel free to use what works best. Thanks! cc: @rochaporto @mrbobbytables @stackedsax @jimbobby5 @yuanchen8911
Definitely interested in helping out with this, we've talked about it multiple times in the past! Really nice!
Dropping here some efforts targeting also fair share and elastic capacity: https://github.com/kubernetes-sigs/scheduler-plugins/tree/master/pkg/capacityscheduling
Great! Thanks for keeping me in the loop. /cc @xujyan
I don't know if any of this old conversation is still relevant, but I thought I'd add it for historical purposes:
We were all so young and innocent....
@stackedsax thanks for sharing! (that was 2 1/2 years ago 😮)
/cc
/cc @Huang-Wei
Thank you @k82cn this is great. The requirements for the batch workloads are really similar, we should be able to abstract out a spec that aligns different projects together. I'd be happy to work with you and others on this.
Thank you @k82cn this is great. The requirements for the batch workloads are really similar, we should be able to abstract out a spec that aligns different projects together. I'd be happy to work with you and others on this.
Great! Looking forward to work together on that :)
Just wanted to throw out some high level thoughts.
Firstly, the solution needs to be Kubernetes native and extensible with
Common feature requirements
FYI, our recent KubeCon talk on batch support (/cc @denkensk)
/cc @xujyan @k82cn @Huang-Wei @rochaporto
A Batch SIG or WG kubernetes/community#6263
Put aside the different ideas, shall we work together as a single virtual team to create a common forum/group for the topic? @ahg-g, @k82cn, @Huang-Wei , @rochaporto @raravena80
A Batch SIG or WG kubernetes/community#6263
Put aside the different ideas, shall we work together as a single virtual team to create a common forum/group for the topic? @ahg-g, @k82cn, @Huang-Wei , @rochaporto @raravena80
CNCF WG is a good place to host such a virtual team cross the community, e.g. CNI
CNI/CSI are great examples. Thank you @k82cn . Can we have something like the following (not strictly to the name or format):
BatchJob
JobQueue
such job/queue definition can be backed by different schedulers, default, scheduler-plugin, Volcano, or YuniKorn. How to use these properties will be vary in different implementation, but essentially this gives enough "hint" for the scheduler to know how to better schedule a job. This will give a certain-consistency of the behavior for scheduling batch jobs on K8s.
such job/queue definition can be backed by different schedulers, default, scheduler-plugin, Volcano, or YuniKorn. How to use these properties will be vary in different implementation, but essentially this gives enough "hint" for the scheduler to know how to better schedule a job.
Exactly! That's why prefer to have such a WG in CNCF instead of a individual community for batch API/Specification :)
it's great to see different teams are coming to the similar category of batching. Some initial works have been public for a while as @yuanchen8911 and @yangwwei listed. Since different people (from HPC, AI/ML, Bid data, other large volume data processing area like simulation, genomics) may have different insight of batching, I'm really looking forward to make clear together what the concrete scope and target scenarios are for this.
Thanks to @rochaporto and @yuanchen8911 for mentioning our work at Batch Scheduling and Management in https://github.com/kubernetes-sigs/scheduler-plugins and https://github.com/kube-queue/kube-queue .
I look forward to everyone who is interested in Batch Compute collaborating to promote the development of Batch on Kubernetes. Nice work @k82cn
Before we discuss the definition of interfaces and components, I hope we can clarify our goals and scope. It would be nicer if we could add some description of the benefits of our WG (like easier to integrate with other projects like Spark/Kubeflow.)
How to use these properties will be vary in different implementation, but essentially this gives enough "hint" for the scheduler to know how to better schedule a job.
@yangwwei I'm a little concerned about having enough expressiveness to be compatible with different architectural implementations. But this can be discussed later. ^_^
Some additional comments in the other thread
https://github.com/kubernetes/community/issues/6263#issuecomment-990121461
Some additional comments in the other thread
Thanks for the input. The WG in k/k should only focus on kubernetes, and a WG in CNCF will help to colaberate cross projects :)
@k82cn Klaus, the key here is the folks in the CNCF WG should help figure out what we need to do in k/k actively based on the ideas/collaboration in CNCF. I hope that happens. There's a lot to be said about functionality that comes out of the box in k8s and the strength of conformance testing to ensure things work across k8s distributions.
the CNCF WG should help figure out what we need to do in k/k actively based on the ideas/collaboration in CNCF
Definitely; one of major target of WG is to collaborate with related projects to clarify the scope and interface.
and here's a draft chart of the WG; if any more comments, please let me know :)
Thanks @k82cn.
I created a Slack channel and added it to the charter:
I tried to guess everyone's Slack handle, but @yuanchen8911 @denkensk @wsxiaozhang I don't know that I got you all correctly. Please jump in if I've missed you or invited the wrong person.
I also added a couple of items and comments to the charter. Thanks again!
Thanks @k82cn.
I created a Slack channel and added it to the charter:
I tried to guess everyone's Slack handle, but @yuanchen8911 @denkensk @wsxiaozhang I don't know that I got you all correctly. Please jump in if I've missed you or invited the wrong person.
I also added a couple of items and comments to the charter. Thanks again!
Thanks, Alex!
@stackedsax Thanks
Thanks for the input. The WG in k/k should only focus on kubernetes, and a WG in CNCF will help to collaborate cross projects :)
Cool! +100 for the Specification like CNI.
Recently, we're talking with different community on how to support batch workload in cloud native environment; and we found it's necessary to align different implementation with a specification, it'll be easier for the framework to do the integration, e.g. kubeflow.
So I'd like to propose a new working group for batch workload, and build related specification for the community, e.g. kubeflow community can use this specification to work with k8s, Volcano and Yunikorn, or event with Slurm, HTCondor :)
I'll draft a proposal with more detail on that working group; if any more comments, please let me know :)