Open rabernat opened 1 year ago
This would be an awesome addition to the hub for pretty mich every project I am involved. Enthusiastic 👍, and happy to test!
I agree that batch submission is an important class of computing for many problems. And it does appear that kbatch
would be one way of providing that functionality.
But, from what I am reading, kbatch
is a relatively simple wrapper for submitting a job to a kubernetes cluster that lacks features compared to something like slurm or prefect. I think it makes sense to use for automatic running of a notebook in a lights-out way, but is its intended use case only for relatively short jobs? Is kbatch still appropriate for running multi-day batch computing?
While it is of course no worries to experiment with kbatch
if that solves people's immediate problems, are there other scheduling systems that we should be considered? I am worrying about things like setting limits on long-running jobs, resources allocation between users, reporting and logging. I think other hubs also are discussing "batch" computing as well so I'll going to add this a feature request on our weekly 2i2c Product and Engineering meeting to see if there are other options that should be considered.
All good questions @jmunroe. @yuvipanda worked quite a bit on kbatch and decided it was the sweet spot in terms of complexity. But happy to align whatever tools 2i2c wants to support here.
Context
Pangeo hub users often want to put a long-running job into the background. Instead, on our hubs today, they have to essentially keep a notebook open all the time for these long running jobs. This leads to awkward and inefficient workflows, such as postdocs not being able to close their laptops for days.
cc @paigem, @jbusecke
Proposal
I propose we install kbatch on the Pangeo hubs. We discussed this idea quite a while back, but I can't find any record of that conversation.
Updates and actions
No response