beacon-biosignals / K8sClusterManagers.jl

A Julia cluster manager for Kubernetes
Other
31 stars 5 forks source link

back Julia cluster instances via a pod controller rather than creating new pods individually? #66

Open jrevels opened 3 years ago

jrevels commented 3 years ago

IIUC K8sClusterManagers currently kubectl creates individual pods whenever a worker is added.

I think it would make more sense from a K8s perspective to associate an actual pod controller with the "Julia cluster instance", and then update the underlying set of pods that back the Julia cluster instance via kubectl applying updates to that controller. This is philosophically more aligned with the way sets of pods are intended to be orchestrated in K8s land AFAIU, and would hopefully enable some configuration/resilience possibilities at the level of the whole cluster-instance and not just at the pod level (e.g. tuning pod scaling behaviors for the cluster instance, migrating workers from failed nodes, etc.)

IIRC the driver Julia process is already backed by a Job so maybe that'd be sufficient? I think StatefulSet is worth considering too. We should tour the built-in controllers and see which ones might make the most sense; especially w.r.t. being able to tell the controller to make additional pods available w/o interfering w/ existing ones

This would probably be massive overkill, but if none of the built-in K8s controllers are sufficient (which I guess is a possibility), you could even imagine a custom K8s operator implemented specifically for this purpose (JuliaClusterInstance)

kolia commented 3 years ago

I can see this being useful to better control pod scaling if other ways are not adequate, but as far as resilience / migrating workers from failed nodes, the Distributed API isn't really set up to make use of workers being dynamically respun in the middle of say a pmap afaict. I'm guessing to make use of worker resilience we'd need to write downstream code significantly differently, with resilience in mind all the way down. Although that could just look like a custom version of pmap and similar, which is not that bad I guess.

jrevels commented 3 years ago

right, doing this would not resolve any "application layer" bottlenecks to resiliency, but would improve things at the "orchestration layer" which is at least a prereq

ericphanson commented 3 years ago

This is a bit of an aside perhaps, but

the Distributed API isn't really set up to make use of workers being dynamically respun in the middle of say a pmap afaict

I think this might actually be OK-- looking at the code, pmap works on an (Abstract)WorkerPool and it take!s a worker from the pool when it needs one. So I think you can dynamically add workers to the pool and it will grab them too.