cubed-dev / cubed

Bounded-memory serverless distributed N-dimensional array processing
https://cubed-dev.github.io/cubed/
Apache License 2.0
122 stars 14 forks source link

Support use of multiprocessing start methods other than "spawn" (e.g. "dragon") #554

Closed applio closed 2 months ago

applio commented 3 months ago

Currently, cubed.runtime.executors.local.async_execute_dag() hard codes the use of the "spawn" start method when employing multiprocessing / concurrent.futures processes. This PR proposes a means for the user to specify their preferred start method via the existing keyword argument use_processes.

This proposed change would permit users to select from the existing multiprocessing start methods of "fork", "spawn", and "forkserver" as well as the newer "dragon" HPC distributed execution start method provided by the Dragon project (https://github.com/dragonhpc/dragon). An example snippet showing how a different start method can now be specified:

cubed.to_zarr(
    some_data,
    store=zg,
    use_processes="dragon",
)

It probably makes sense to document this new functionality though it appears that the keyword argument, use_processes, does not yet appear anywhere in the documentation. The "Configuration" page (https://cubed-dev.github.io/cubed/configuration.html#processes) might be a good spot to describe use_processes in general along with this added control. I would be happy to propose some documentation text if others agree on where it ought to go.

TomNicholas commented 3 months ago

It's great that this is the only change needed to run on Dragon!

Cool! Thanks @applio.

adding a unit test (without installing Dragon)

I think we should add unit tests which do install Dragon, but we can leave those to follow-up PRs.

TomNicholas commented 3 months ago

It's great that it's this easy to get running on Dragon, but I do wonder whether this use_processes/spawn interface is just going to be confusing for users. I would like to be able to recommend different Executors to users based on the system they are trying to run on (i.e. "Cloud? Use lithops! Local machine? Use the local executor! HPC? Use the DragonExecutor!").

Whilst the use_processes is extremely neat for us developers I wonder if that subtlety should actually be hidden from the users behind a DragonExecutor abstraction, even if it uses similar codepaths under the hood.

tomwhite commented 2 months ago

I agree with adding a DragonExecutor in a new dragon.py module. It would also be a natural place to add Dragon-specific configuration in the future.

tomwhite commented 2 months ago

We discussed this in the meeting and decided to merge it as it may be generally useful for specifying multiprocessing start methods other than "spawn". @applio will still do a separate PR for the DragonExecutor as discussed above.