aiidateam / aiida-code-registry

Registry of simulation codes and computers for easy setup in AiiDA.
2 stars 11 forks source link

Naming of top-level computer directories #4

Closed ltalirz closed 4 years ago

ltalirz commented 4 years ago

During a discussion on 2020-04-06 it was suggested to consider using fully qualified domain names as the directory names in computers/.

This has the advantage of being automatically unique, but the disadvantage of introducing another label. It also means that storing separate variants of computers (e.g. daint-mc, daint-gpu) will require the introduction of an additional nesting level

computers/daint.cscs.ch/mc/...
computers/daint.cscs.ch/gpu/...

Since the domain name is not decided by the registry, but by the supercomputing centre, it is also possible (perhaps unlikely?) that it changes over time... perhaps not a big issue.

While I'm not strongly against using domain names, I feel the simpler solution would be to use the computer label as we do now. This has the advantage of avoiding clashes of identical labels by design, which I would consider a feature.

yakutovicha commented 4 years ago

During a discussion on 2020-04-06 it was suggested to consider using fully qualified domain names as the directory names in computers/.

One question here, if the subfolders are supposed to have codes inside, do we really want to call it compters?

This has the advantage of being automatically unique, but the disadvantage of introducing another label.

This has the advantage of avoiding clashes of identical labels by design, which I would consider a feature.

Also, one point to add here. Even if the domain name is not changed, the computer itself might be updated. Essentially, this means that the aiida user will have to create a new computer. They won't be able to delete it (there might be calculations linked), they will have to rename it. The fact that users might need to rename the computer means that whatever you put in the computers/ folder won't work as aiida name. This means that we are introducing a new name with no purpose.

Therefore, I would vote for the fully qualified domain name, as it is (1) always has a meaning (2) readable by the user (3) always unique.

yakutovicha commented 4 years ago

Since the domain name is not decided by the registry, but by the supercomputing centre, it is also possible (perhaps unlikely?) that it changes over time... perhaps not a big issue.

I also think it is not a big issue, as, to my understanding, the repository should be "live".

ltalirz commented 4 years ago

One question here, if the subfolders are supposed to have codes inside, do we really want to call it compters?

Well, the subfolders contain both codes and computers, but each subfolder corresponds to one AiiDA computer. Not sure whether calling it codes would be easier to understand?

They won't be able to delete it (there might be calculations linked), they will have to rename it.

Thanks for the observation, I hadn't thought about this before. I guess we will have this problem independently of whether we use the domain name or a shorter one, correct? In any case, users can always use --label to override the computer label when setting up a computer from the registry, i.e. we are just discussing the default name here.

we are introducing a new name with no purpose.

That is not true - the AiiDA computer name can be significantly shorter. daint.cscs.ch is very short, but the URLs of other HPC centers can be significantly longer. Not sure people want to type those on the CLI.

Furthermore, there is anyhow the need to have custom computer names for different variants, e.g. daint-mc and daint-gpu. This would become daint.cscs.ch-mc which can give the false impression that this is a domain name, while it isn't.

Happy to discuss more on this, also in person!

yakutovicha commented 4 years ago

Not sure whether calling it codes would be easier to understand?

I didn't mean to suggest calling it Codes, we might think of calling the folder something else, like setup, configurations, database.

I guess we will have this problem independently of whether we use the domain name or a shorter one, correct?

not really. The domain name is used for the setup and it can always remain the same.

In any case, users can always use --label to override the computer label when setting up a computer from the registry, i.e. we are just discussing the default name here.

This would become daint.cscs.ch-mc which can give the false impression that this is a domain name, while it isn't.

Yes, and that is why I would vote for the nesting name you were mentioning in the first post:

computers/daint.cscs.ch/mc/...
computers/daint.cscs.ch/gpu/...

Concerning the long names:

but the URLs of other HPC centers can be significantly longer. Not sure people want to type those on the CLI.

Do you have an example of such a long URL? I think one would copy-paste the URL of the json file to set up a computer. Or was there another idea?

$ verdi computer setup --config http://some.url/.../daint.cscs.ch/mc/gpu/computer.yml
yakutovicha commented 4 years ago

Happy to discuss more on this, also in person!

me too.

ltalirz commented 4 years ago

I didn't mean to suggest calling it Codes, we might think of calling the folder something else, like setup, configurations, database.

Ah, sorry I misunderstood. Yes, a different term than computers may work as well. Now that we start storing both computer setup and some computer configure information there, the names setup and configuration are perhaps no longer ideal...

not really. The domain name is used for the setup and it can always remain the same.

Perhaps my misunderstanding is this: would you use the domain name only for the directory name or also for the computer label? I was thinking of using it in both places, since otherwise we are indeed introducing a label just for the registry. If you do that, however, you will end up with computers daint.cscs.ch-mc-1, daint.cscs.ch-mc-2 etc. in your database.

I would vote for the nesting

We could do that - but it introduces another level, even for computers where it is not needed.

Do you have an example of such a long URL? I think one would copy-paste the URL of the json file to set up a computer. Or was there another idea?

Ah, my concern was not for the initial setup - that URL will anyhow be long and needs to be put there only once. I was thinking about interacting with the computer lateron via the verdi cli (assuming you use the domain name for the label). E.g. skx.supermuc.lrz.de is already a bit longer; also the domain name is the one of the login nodes, not the top-level.

me too.

I'll write on on slack tomorrow afternoon

ltalirz commented 4 years ago

To summarize, @yakutovicha would propose the following:

Directory structure

registry/daint.cscs.ch/multicore/*.yaml
registry/daint.cscs.ch/hybrid/*.yaml
registry/daint.cscs.ch/default  # symlink to default subfolder [1]

Computer labels

He suggests they could be left unspecified, or the regular daint-mc shortname (not the domain name).

My take

This directory structure would work - my only point is that this introduces an additional level of nesting that we won't really need for some time (or if the number of variants of the same computer remains low). It also means that the label of the computer is no longer apparent from the name of the folder that contains it. I'm not strongly for or against it.

As for the label, I strongly suggest that we provide a default value in the registry - using the same name across aiida databases will let others immediately understand the meaning of the label (even if some users might need to modify the label if they make changes to the setup).

@unkcpz Would you mind giving your input as well? After this, we can make a decision.

[1] Git supports symlinks

unkcpz commented 4 years ago

I assume this repository is created for two purpose: 1) for aiida users to store there computer and code setup configurations so that they can reuse it or distribute to other uses for quick setup. 2) used in aiidalab to automatically fill the form when setup the computers and codes. For these reasons, I vote to @yakutovicha proposal of additional level of nesting since users can copy-paste the URL to setup the computer. This is not even a problem for aiidalab. However, what I concern is, in my experience, the supercomputer center in China is not properly maintained in most time, so we do not have a consistent domain name. In this case, a descriptive name should replace the domain as the top level folder name.

ltalirz commented 4 years ago

Ok, in that case it's decided and we go with

registry/daint.cscs.ch/multicore/*.yaml
registry/daint.cscs.ch/hybrid/*.yaml
registry/daint.cscs.ch/default  # symlink to default subfolder [1]

with the possibility to use a descriptive name in case a domain name is not available.

@unkcpz Would you mind making the PR?

unkcpz commented 4 years ago

No problem. Leave it to me :smile:

unkcpz commented 4 years ago

Just for clear, the setup file is named as computer-setup.yml and computer-configure.yml for configure, correct? well, what about naming code configure file? I propose to use executable file name + dash + version, pw-6.5.yml for example. @ltalirz

ltalirz commented 4 years ago

@unkcpz That is discussed here: https://github.com/aiidateam/aiida-code-registry/issues/5 , please have a look.

ltalirz commented 4 years ago

Fixed in #15