Allow templating inside `prepend_text` and `append_text`

aiidateam / aiida-core

The official repository for the AiiDA code

https://aiida-core.readthedocs.io

Other

430 stars 186 forks source link

Allow templating inside `prepend_text` and `append_text` #5949

Open sphuber opened 1 year ago

sphuber commented 1 year ago

Note, this feature request is different from #4680 as in that asks how the configuration yamls that are consumed by verdi computer setup and verdi code create can use templating, but the template would be made concrete before passing it to the CLI command.

Here, the request is to add literal template code to the prepend_text and append_text, that is rendered upon the submission of a CalcJob using the Computer or Code in question. There already is some support for this but it is quite ad-hoc. For example, the Computer.work_dir attribute allows {username} and Command.mpirun_command allows the {tot_num_mpiprocs}.

There are other use cases for this kind of functionality in other attributes, such as the prepend_text. For example, one may want to define a path based on the username:

export SOURCE=/shared/{username}/src

or to dynamically set the OMP_NUM_THREADS variable based on the number of CPUs per MPI task requested:

export OMP_NUM_THREADS={num_cores_per_mpiproc}

We should see whether it would be possible to provide templating, how generic it can be made (should we allow full Jinja2 templates and even allow control structures, like conditionals and loops?) and how should the template variables be defined on launch time of the CalcJob.

ltalirz commented 1 year ago

This comment is a bit tangential (I can move it to a separate issue):

On the OMP_NUM_THREADS topic, in reviewing https://github.com/aiidateam/aiida-core/pull/5948/files I was surprised to discover that the direct scheduler actually sets OMP_NUM_THREADS based on num_cores_per_mpiproc https://github.com/aiidateam/aiida-core/blob/f5813092879a7204e5d4d51da463841bb32881eb/aiida/schedulers/plugins/direct.py#L160-L161

Other scheduler plugins currently do not do that, e.g. SLURM will limit the number of tasks per CPU but not set OMP_NUM_THREADS, which results in different behavior (basically it is let up to the executable to decide whether to use OMP threads and how many). https://github.com/aiidateam/aiida-core/blob/f5813092879a7204e5d4d51da463841bb32881eb/aiida/schedulers/plugins/slurm.py#L354-L355

We may want to think about what is the best default behavior and then be consistent across scheduler plugins.

sphuber commented 1 year ago

That was added in #5126 which was released with v2.0.0. It was proposed by @dev-zero

I opened this issue and #5948 following questions on Slack. They also stumbled over the difference between direct and the other plugins, but only because they were trying to get export OMP_NUM_THREADS to be added dynamically by setting metadata.options.environment_variables inside the CalcJob plugin. Unfortunately, this doesn't work since the inputs are frozen at that point.

If we can find a more generic method of automatically setting OMP_NUM_THREADS correctly that may be applied to all schedulers, for example through the method suggested in this feature request, then we can remove the explicit case in the DirectScheduler.

ljbeal commented 1 year ago

This feature would be beneficial for our use case, as the scheduler we've been using requires this environment variable to be set within the jobscript. I believe that torque requires this environment variable to be specified, as the resources allocation does not specify a --cpus-per-task like parameter.

Additionally, codes that can use environment variables to specify runtime modes benefit greatly from this. Though ideally one would be able to update the self.metadata.options.environment_variables dictionary to do this within the plugin, having access at least within the code/computer setup would satisfy this.

As a specific example, BigDFT can utilise a FUTILE_DEBUG_MODE environment variable to specify the debug level. It would be ideal to be able to specify this in an optional input (aside from the environment_variables) but without having access to the jobscript within the plugin in some form I can't think of a way to do this. However this is a tangent that's straying back into the plugin area rather than the code/computer.

The simplest way to put this is that for me it lines up with what I would expect based on the defaults that already contain this syntax when creating a computer: When I see mpirun -np {tot_num_mpiprocs} as a default, I assume that this functionality is available for all typed inputs, which sent me down a lengthy debugging path trying to implement this as a feature.

Sorry for the delay in responding!

sphuber commented 1 year ago

Though ideally one would be able to update the self.metadata.options.environment_variables

This is probably not going to happen as it would require messing with AiiDA's provenance principle and allowing stored inputs to be mutated. This is too big of a change for this kind of problem I would say.

As a specific example, BigDFT can utilise a FUTILE_DEBUG_MODE environment variable to specify the debug level. It would be ideal to be able to specify this in an optional input (aside from the environment_variables) but without having access to the jobscript within the plugin in some form I can't think of a way to do this.

At least for this case, you can add in your plugin documentation that the user can specify

builder.metadata.options.environment_variables = {'FUTILE_DEBUG_MODE': 'ERROR'}

Even though it maybe might be nicer to define a custom input on your process class and have the plugin set this, that is currently not possible and this approach is available.

The simplest way to put this is that for me it lines up with what I would expect based on the defaults that already contain this syntax when creating a computer: When I see mpirun -np {tot_num_mpiprocs} as a default, I assume that this functionality is available for all typed inputs, which sent me down a lengthy debugging path trying to implement this as a feature.

Fully agree, and I think this issue should address this either by adding templating support for all attributes, or making it very clear where templating is supported and which placeholders, which is currently not documented I believe.

ljbeal commented 1 year ago

This is probably not going to happen as it would require messing with AiiDA's provenance principle and allowing stored inputs to be mutated. This is too big of a change for this kind of problem I would say.

Indeed this makes sense, there's no reason to make such a large shift in the aiida api for this issue. I would assume that tweaks to the jobscript wouldn't affect provenance, but I trust your word that it does (or is at least a breaking change to make it so that it doesn't).

Although unless I'm misunderstanding something, does this represent a bug in the torque scheduler? I believe OMP_NUM_THREADS needs to be specified, as is currently being done for local.

Fully agree, and I think this issue should address this either by adding templating support for all attributes, or making it very clear where templating is supported and which placeholders, which is currently not documented I believe.

Obviously the former case is preferable, however making it clear that templating is supported only in specific instances would be beneficial in preventing someone else getting lost down this rabbit hole!

On that note, it may also be helpful to add a note to the documentation here. Something to warn that when accessing these attributes within the plugin, what's returned are just copies, and serve a solely read-only purpose.

sphuber commented 1 year ago

Although unless I'm misunderstanding something, does this represent a bug in the torque scheduler? I believe OMP_NUM_THREADS needs to be specified, as is currently being done for local.

That I don't know. I am not familiar with torque whatsoever.

ljbeal commented 1 year ago

That I don't know. I am not familiar with torque whatsoever.

I believe it does, similar to how it requires the cd $PBS_O_WORKDIR line. Going by this page: https://hpc-wiki.info/hpc/Torque the 2nd example uses

### Execute your application and set OMP_NUM_THREADS for the application run
OMP_NUM_THREADS=12 myapp.exe

So it seems this can be worked around presently by setting the mpirun command. It's then to be decided whether or not it would be preferable to have torque behave similarly to direct and prepend the OMP line. Perhaps I can create a separate issue for this?