Open sphuber opened 1 year ago
This comment is a bit tangential (I can move it to a separate issue):
On the OMP_NUM_THREADS topic, in reviewing https://github.com/aiidateam/aiida-core/pull/5948/files I was surprised to discover that the direct
scheduler actually sets OMP_NUM_THREADS
based on num_cores_per_mpiproc
https://github.com/aiidateam/aiida-core/blob/f5813092879a7204e5d4d51da463841bb32881eb/aiida/schedulers/plugins/direct.py#L160-L161
Other scheduler plugins currently do not do that, e.g. SLURM will limit the number of tasks per CPU but not set OMP_NUM_THREADS
, which results in different behavior (basically it is let up to the executable to decide whether to use OMP threads and how many).
https://github.com/aiidateam/aiida-core/blob/f5813092879a7204e5d4d51da463841bb32881eb/aiida/schedulers/plugins/slurm.py#L354-L355
We may want to think about what is the best default behavior and then be consistent across scheduler plugins.
That was added in #5126 which was released with v2.0.0. It was proposed by @dev-zero
I opened this issue and #5948 following questions on Slack. They also stumbled over the difference between direct
and the other plugins, but only because they were trying to get export OMP_NUM_THREADS
to be added dynamically by setting metadata.options.environment_variables
inside the CalcJob
plugin. Unfortunately, this doesn't work since the inputs are frozen at that point.
If we can find a more generic method of automatically setting OMP_NUM_THREADS
correctly that may be applied to all schedulers, for example through the method suggested in this feature request, then we can remove the explicit case in the DirectScheduler
.
This feature would be beneficial for our use case, as the scheduler we've been using requires this environment variable to be set within the jobscript. I believe that torque requires this environment variable to be specified, as the resources allocation does not specify a --cpus-per-task
like parameter.
Additionally, codes that can use environment variables to specify runtime modes benefit greatly from this. Though ideally one would be able to update the self.metadata.options.environment_variables
dictionary to do this within the plugin, having access at least within the code/computer setup would satisfy this.
As a specific example, BigDFT can utilise a FUTILE_DEBUG_MODE
environment variable to specify the debug level. It would be ideal to be able to specify this in an optional input (aside from the environment_variables) but without having access to the jobscript within the plugin in some form I can't think of a way to do this. However this is a tangent that's straying back into the plugin area rather than the code/computer.
The simplest way to put this is that for me it lines up with what I would expect based on the defaults that already contain this syntax when creating a computer:
When I see mpirun -np {tot_num_mpiprocs}
as a default, I assume that this functionality is available for all typed inputs, which sent me down a lengthy debugging path trying to implement this as a feature.
Sorry for the delay in responding!
Though ideally one would be able to update the self.metadata.options.environment_variables
This is probably not going to happen as it would require messing with AiiDA's provenance principle and allowing stored inputs to be mutated. This is too big of a change for this kind of problem I would say.
As a specific example, BigDFT can utilise a FUTILE_DEBUG_MODE environment variable to specify the debug level. It would be ideal to be able to specify this in an optional input (aside from the environment_variables) but without having access to the jobscript within the plugin in some form I can't think of a way to do this.
At least for this case, you can add in your plugin documentation that the user can specify
builder.metadata.options.environment_variables = {'FUTILE_DEBUG_MODE': 'ERROR'}
Even though it maybe might be nicer to define a custom input on your process class and have the plugin set this, that is currently not possible and this approach is available.
The simplest way to put this is that for me it lines up with what I would expect based on the defaults that already contain this syntax when creating a computer: When I see mpirun -np {tot_num_mpiprocs} as a default, I assume that this functionality is available for all typed inputs, which sent me down a lengthy debugging path trying to implement this as a feature.
Fully agree, and I think this issue should address this either by adding templating support for all attributes, or making it very clear where templating is supported and which placeholders, which is currently not documented I believe.
This is probably not going to happen as it would require messing with AiiDA's provenance principle and allowing stored inputs to be mutated. This is too big of a change for this kind of problem I would say.
Indeed this makes sense, there's no reason to make such a large shift in the aiida api for this issue. I would assume that tweaks to the jobscript wouldn't affect provenance, but I trust your word that it does (or is at least a breaking change to make it so that it doesn't).
Although unless I'm misunderstanding something, does this represent a bug in the torque scheduler? I believe OMP_NUM_THREADS
needs to be specified, as is currently being done for local
.
Fully agree, and I think this issue should address this either by adding templating support for all attributes, or making it very clear where templating is supported and which placeholders, which is currently not documented I believe.
Obviously the former case is preferable, however making it clear that templating is supported only in specific instances would be beneficial in preventing someone else getting lost down this rabbit hole!
On that note, it may also be helpful to add a note to the documentation here. Something to warn that when accessing these attributes within the plugin, what's returned are just copies, and serve a solely read-only purpose.
Although unless I'm misunderstanding something, does this represent a bug in the torque scheduler? I believe OMP_NUM_THREADS needs to be specified, as is currently being done for local.
That I don't know. I am not familiar with torque whatsoever.
That I don't know. I am not familiar with torque whatsoever.
I believe it does, similar to how it requires the cd $PBS_O_WORKDIR
line. Going by this page: https://hpc-wiki.info/hpc/Torque the 2nd example uses
### Execute your application and set OMP_NUM_THREADS for the application run
OMP_NUM_THREADS=12 myapp.exe
So it seems this can be worked around presently by setting the mpirun command. It's then to be decided whether or not it would be preferable to have torque
behave similarly to direct
and prepend the OMP
line. Perhaps I can create a separate issue for this?
Note, this feature request is different from #4680 as in that asks how the configuration yamls that are consumed by
verdi computer setup
andverdi code create
can use templating, but the template would be made concrete before passing it to the CLI command.Here, the request is to add literal template code to the
prepend_text
andappend_text
, that is rendered upon the submission of aCalcJob
using theComputer
orCode
in question. There already is some support for this but it is quite ad-hoc. For example, theComputer.work_dir
attribute allows{username}
andCommand.mpirun_command
allows the{tot_num_mpiprocs}
.There are other use cases for this kind of functionality in other attributes, such as the
prepend_text
. For example, one may want to define a path based on the username:or to dynamically set the
OMP_NUM_THREADS
variable based on the number of CPUs per MPI task requested:We should see whether it would be possible to provide templating, how generic it can be made (should we allow full Jinja2 templates and even allow control structures, like conditionals and loops?) and how should the template variables be defined on launch time of the
CalcJob
.