TREX-CoE / trexio_tools

Set of tools for trexio files
BSD 3-Clause "New" or "Revised" License
18 stars 8 forks source link

keeping the full label of elements (including numbers) in the molecul… #13

Closed neelravi closed 11 months ago

neelravi commented 2 years ago

Removing the numbers from atom labels loses information. If there are two types of the basis for the same type of elements, we generally distinguish them with (say H1, H2). The element can be uniquely identified by its charge but not by its label.

So it is necessary to keep the numbers after element symbols.

q-posev commented 2 years ago

Thanks! Normally it is possible to have different basis sets for the same atom types in trexio. This is why we have the flexible basis group with data stored both in per-atom and per-shell ways. It should not be necessary to change the atomic labels for that.

scemama commented 2 years ago

The potential problem I see is that if the labels don't correspond strictly to the elements, this field becomes "free input" and has no other use than documentation or program-specific data: In this particular case you know that the different H1 have the same basis because it is the convention used by the program that wrote the TREXIO file. But if in my code I choose the convention that H1 are the hydrogens on fragment one H2 are the hydrogens of fragment two, they might have different basis sets and your interpretation of the TREXIO file will be wrong.

A better option would be to leave the atom labels as they are, and to add an extra array to identify which centers have the same basis information.

In the basis group, we could add:

| same_as  | index  |  (nucleus.num) |   Identifies centers with the same basis parameters |

where same_as(i) = j means that center i has the same parameters as center j, and if same_as(i) = i the parameters have to be read.

What do you think?

neelravi commented 2 years ago

Let us take an example of H2 from the trexio documentation. In the basis group,

#6 shells per H atom
nucleus_index =
[ 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1 ]

What determines the nuclear index here? The position of atoms in the molecular coordinates, the atomic charge, or the atomic label?

And now if I have the following case,

# 4 shells per first H atom and 6 shells for second H
nucleus_index =
[ 0, 0, 0, 0,  1, 1, 1, 1, 1, 1 ]

In this case, the nucleus group would have


nucleus_charge = [1, 1]
nucleus_label = ['H', 'H']

It would be cumbersome to distinguish the atoms by counting the number of shells in the basis and matching them with the nuclear indices.