glotzerlab / signac-flow

Workflow management for signac-managed data spaces.
https://signac.io/
BSD 3-Clause "New" or "Revised" License
48 stars 37 forks source link

FlowProject.run should respect directives #175

Closed vyasr closed 4 years ago

vyasr commented 4 years ago

Feature description

Currently FlowProject.run ignores execution directives, i.e. operations marked with @flow.directives(nranks=8) will run serially unless they are actually submitted to a scheduler. Instead, FlowProject.run should properly handle these directives.

Proposed solution

This change depends on #174, which will provide the require MPI commands (and potentially other execution directive-specific commands for e.g. OpenMP). Once that issue is resolved, we can change run to call the necessary function to modify the run command. The other important change will be that FlowProject.run now needs to decide whether or not to fork based on an additional set of criteria that accounts for these directives.

Additional context

Making this change is critical to enabling groups #114.

vyasr commented 4 years ago

There is one specific case that will require some extra work to enable, namely the offset-based bundling of multiple MPI jobs onto one node on stampede. Currently we enable this by looping and using python project.py exec in the template script. @b-butler and I discussed that to enable this properly we will probably need to remove all such looping from the template and instead enable it directly in run. However, this means that we may also need to generalize the way in which we generate the commands executed by python project.py run to naturally enable environments to perform specific modifications to the run command such as adding the offset for stampede. One possibility for enabling this is to create a ComputeEnvironment.run analogous to ComputeEnvironment.submit that would allow the command to be overridden by individual environments. I haven't spent too much time thinking about this yet but wanted to log my thoughts before forgetting; I'm open to other thoughts on how to implement this as well.

vyasr commented 4 years ago

Partially resolved by #208. The question I raised in my previous comment remains to be addressed, but that probably will be done as part of #114.

vyasr commented 4 years ago

The part of this that remains unresolved is currently very specific to Stampede2 and is separately documented in #250, so this issue can be closed.