Support a `Revise` workflow on workers

kolia commented 3 years ago

devspace sync is an easy local install that establishes a 2-way sync between local folders and folders in containers running in k8s. julia_pod uses this to sync the current julia project folder and with its equivalent in the running container, making it possible to update code locally and use Revise against julia REPL running in k8s.

This convenient development workflow breaks down when using K8sClusterManagers to spin up workers, because workers see separate file systems.

Because of this, it is currently recommended to always test Distributed code using local Distributed procs first before running on k8s.

If devspace sync is installed, we could set up one or more syncs between relevant local and worker folders for each worker as it is spun up, maybe defaulting to the current julia project folder. This would make it possible to use Revise and develop code against code using workers directly.

omus commented 3 years ago

Overall I agree with the premise that it would be good to be able to support a Revise-based workflow without having to fall back on local Distributed processes.

However, I think there are probably better options to this than chaining devspace sync processes on your local system and on the manager pod. One such option is keeping the devspace sync processes on the local system and syncing multiple pods at once.

The tricky part here is that not all pods will be available at the time of the initial sync. We could probably just have a background process which monitors for new pods based upon a selector. When new pods are found it can start syncing to them as well.

Some other alternatives which may be worth investigating:

Use NFS to share storage among the manager and worker pods
Make Revise.jl distributed-aware. It should be possible to have Revise issue the replacement calls on all processes. This option would be work on any Julia cluster.

kolia commented 3 years ago

One such option is keeping the devspace sync processes on the local system and syncing multiple pods at once.

That's what I meant, as far as I can tell devspace is only designed to be run from your local system. Users would probably only turn this syncing on while developing with a small number of workers, and then switch it off when running things on many workers.

The tricky part here is that not all pods will be available at the time of the initial sync. We could probably just have a background process which monitors for new pods based upon a selector. When new pods are found it can start syncing to them as well.

If we do it this way, it could be a little standalone non-julia-specific tool, and the sync part could be removed from julia_pod as we assume users can use this tool instead for syncing.

Some other alternatives which may be worth investigating:

Use NFS to share storage among the manager and worker pods

Shared volumes seem to be tied to cloud providers, I haven't seen a cloud-provider-agnostic way to do this.

Make Revise.jl distributed-aware. It should be possible to have Revise issue the replacement calls on all processes. This option would be work on any Julia cluster.

That would be nice! but I wouldn't know where to begin.

beacon-biosignals / K8sClusterManagers.jl

Support a `Revise` workflow on workers #84