Closed applio closed 2 months ago
It's great that this is the only change needed to run on Dragon!
Cool! Thanks @applio.
adding a unit test (without installing Dragon)
I think we should add unit tests which do install Dragon, but we can leave those to follow-up PRs.
It's great that it's this easy to get running on Dragon, but I do wonder whether this use_processes
/spawn
interface is just going to be confusing for users. I would like to be able to recommend different Executors to users based on the system they are trying to run on (i.e. "Cloud? Use lithops! Local machine? Use the local executor! HPC? Use the DragonExecutor
!").
Whilst the use_processes
is extremely neat for us developers I wonder if that subtlety should actually be hidden from the users behind a DragonExecutor
abstraction, even if it uses similar codepaths under the hood.
I agree with adding a DragonExecutor
in a new dragon.py
module. It would also be a natural place to add Dragon-specific configuration in the future.
We discussed this in the meeting and decided to merge it as it may be generally useful for specifying multiprocessing start methods other than "spawn". @applio will still do a separate PR for the DragonExecutor
as discussed above.
Currently,
cubed.runtime.executors.local.async_execute_dag()
hard codes the use of the"spawn"
start method when employingmultiprocessing
/concurrent.futures
processes. This PR proposes a means for the user to specify their preferred start method via the existing keyword argumentuse_processes
.This proposed change would permit users to select from the existing
multiprocessing
start methods of"fork"
,"spawn"
, and"forkserver"
as well as the newer"dragon"
HPC distributed execution start method provided by the Dragon project (https://github.com/dragonhpc/dragon). An example snippet showing how a different start method can now be specified:It probably makes sense to document this new functionality though it appears that the keyword argument,
use_processes
, does not yet appear anywhere in the documentation. The "Configuration" page (https://cubed-dev.github.io/cubed/configuration.html#processes) might be a good spot to describeuse_processes
in general along with this added control. I would be happy to propose some documentation text if others agree on where it ought to go.