aiidateam / aiida-code-registry

Registry of simulation codes and computers for easy setup in AiiDA.
2 stars 11 forks source link

add code registry tools #61

Closed ltalirz closed 2 years ago

ltalirz commented 2 years ago

This adds a registry that allows users to place a set of yaml files describing computers and codes installed on them in a folder of their choice (AIIDA_CODE_REGISTRY). When using the load_code or load_computer functions from aiida_azq.registry, these codes are set up automatically on demand.

Related to https://github.com/aiidateam/aiida-code-registry/issues/60 @unkcpz

ltalirz commented 2 years ago

Some choices made here:

1) One question is at which level to "group" configurations. The original choice to have one file per AiiDA command arose because that made it easy to integrate into click - but this makes it somewhat tedious to edit/maintain these files. Also, it means file names need to carry meaning; one needs to duplicate labels in the file name and inside the file, ...

I think the top-level grouping we've converged to here in the registry (to group at the "HPC site" level) is a natural one.

Here, I'm adapting this type of grouping by going to one yaml file per site of the form

computers:
 - label: ...
   setup:
   configure:
     core.ssh:
       ...
   codes:
   - label:
      ...

The registry folder can contain multiple such yaml files, whose contents will simply be merged together. The name of these files is irrelevant.

The current script also allows a top-level codes: field, whose codes will be applied to all computers listed.

This yaml format could become how we store computer/code configurations here on the registry

ltalirz commented 2 years ago

2) Template variables are passed directly to the load_code / load_computer commands.

This is to support cases like: if template variable 1 has value x, then template variable 2 should have value y, which would be hard to express statically. Of course, if the possible values are known beforehand, such cases could also be resolved by just putting all possible static combinations of template variables inside the configuration, but that is less convenient (and maybe the possible values are not known beforehand).

In practice, passing the template variables to the load_code / load_computer commands was more convenient for me.

ltalirz commented 2 years ago

Still to do:

yakutovicha commented 2 years ago

I think the top-level grouping we've converged to here in the registry (to group at the "HPC site" level) is a natural one.

Just adding my 5 cents. As we discussed earlier, it probably makes sense to keep computers and code setup detached. So the yaml file would look something like:

computers:
- label: ...
  setup:
  configure:
    core.ssh:
      ...
codes:
- label:
    ...

It would probably then make sense to merge load_code and load_computer in one function something like load_resource and use it load_resource("code@computer"). The function could check whether the computer is already available and if not - set it up.

sphuber commented 2 years ago

Few comments:

It would probably then make sense to merge load_code and load_computer in one function something like load_resource and use it load_resource("code@computer"). The function could check whether the computer is already available and if not - set it up.

Note that load_code in principle already supports adding the @computer-label to load a code for a specific computer.

I am not sure we should hijack the current load_code and load_computer to add a lot of magic to automatically create entities. Maybe a new dedicated function would be better.

Just adding my 5 cents. As we https://github.com/aiidateam/aiida-code-registry/issues/35, it probably makes sense to keep computers and code setup detached. So the yaml file would look something like:

I tried to read the thread to determine your argument for decoupling a code from a computer, but couldn't find it. A Code contains information that is going to be specific to a computer, is it not? For example the absolute executable path. The path for QE's pw.x might be different on different computers. I imagine this holds for many cases. So what would be the benefit of having "generic" code definitions?

ltalirz commented 2 years ago

Thanks for the feedback @yakutovicha !

I think the top-level grouping we've converged to here in the registry (to group at the "HPC site" level) is a natural one.

Just adding my 5 cents. As we discussed earlier, it probably makes sense to keep computers and code setup detached. So the yaml file would look something like:

computers:
- label: ...
  setup:
  configure:
    core.ssh:
      ...
codes:
- label:
    ...

I agree. As mentioned in the text below my yaml example, your layout is actually already supported by the example script as well. All codes specified at the top level are added to all computers in the yaml file.

It would probably then make sense to merge load_code and load_computer in one function something like load_resource and use it load_resource("code@computer"). The function could check whether the computer is already available and if not - set it up.

I reused the names from aiida-core since this makes it a drop-in replacement, but if we feel strongly about it we could also call the function differently (note that for your proposition this function would return instances of different classes... to think about).

In case you were wondering: if you load_code a "code@computer" where the computer does not yet exist, the script does already set up the computer automatically (and "code@computer" is the only label format for codes that the script currently supports).

ltalirz commented 2 years ago

Thanks @sphuber as well!

I am not sure we should hijack the current load_code and load_computer to add a lot of magic to automatically create entities. Maybe a new dedicated function would be better.

Yes, the use case is a bit different and the templating mechanism would require the introduction of a new argument. One can start with a separate function or a plugin and see what is the most seamless way we can come up with to simplify a new user's life.

I tried to read the thread to determine your argument for decoupling a code from a computer, but couldn't find it. A Code contains information that is going to be specific to a computer, is it not? For example the absolute executable path. The path for QE's pw.x might be different on different computers. I imagine this holds for many cases. So what would be the benefit of having "generic" code definitions?

I think one can make a case for both scenarios (which is why the script I added supports both). The case of one code for multiple computers can come in cases e.g. where one wants to hardcode the queue and perhaps some other metadata in the computer (and have several of those for the same HPC system). To some degree this can be alleviated by templating, but there can be cases where one just wants to create a computer with a new label for it.

yakutovicha commented 2 years ago

I tried to read the thread to determine your argument for decoupling a code from a computer, but couldn't find it. A Code contains information that is going to be specific to a computer, is it not? For example the absolute executable path. The path for QE's pw.x might be different on different computers. I imagine this holds for many cases. So what would be the benefit of having "generic" code definitions?

The file structure we came up with so far is this:

daint
.
├── codes
│   ├── cp2k-8.1-cpu.yaml
│   ├── cp2k-8.1-gpu.yaml
│   ├── pp-6.5.yaml
│   ├── projwfc-6.5.yaml
│   └── pw-6.5.yaml
├── hybrid
│   ├── computer-configure.yaml
│   └── computer-setup.yaml
├── hybrid-s1005
│   ├── computer-configure.yaml
│   └── computer-setup.yaml
├── hybrid-s904
│   ├── computer-configure.yaml
│   └── computer-setup.yaml
...

or, in other words:

domain
├── codes
│   ├── ...
├── computer1
│   ├── computer-configure.yaml
│   └── computer-setup.yaml
├── computer2
│   ├── computer-configure.yaml
│   └── computer-setup.yaml

So domain refers to an actual machine while computer1... represent different configurations (different projects, different partitions, etc.). Before, we had all the codes under each computer configuration. @mbercx proposed here to decouple the computers and the codes to reduce duplication. And this is what happened in #38.

I am repeating myself here, but to be clear: by construction domain refers to the same machine so all executables listed in the codes folder are available there. So it makes total sense to me to keep the computers and the codes set up independent from each other and interchangeable.

ltalirz commented 2 years ago

superseded by #62