COMCIFS / cif_core

The IUCr CIF core dictionary
14 stars 9 forks source link

Impossible to index into a default list when there is more than one key value #397

Closed jamesrhester closed 1 year ago

jamesrhester commented 1 year ago

As per discussion in #393 , to look up the scattering length of an atom both atomic mass and atomic number are required. The current _enumeration.def_index_id and _enumeration_default.index system allow only a single value to be used and so neutron scattering lengths cannot be referenced. Multiple data names could be used to specify the values for indexing by looping some new DDLm attribute, e.g. _enumeration_index.id, but this does not solve the issue of there being only a single column available to index into, with values given in _enumeration_default.index.

rowlesmr commented 1 year ago

Just an idea of what something that works could be: (this is pretty bad, but does (could) work).

save_atom_type_scat.length_neutron

    _definition.id                '_atom_type_scat.length_neutron'
    _definition.update            2023-05-17
    _description.text
;
    The bound coherent scattering length for the atom type at the
    isotopic composition used for the diffraction experiment.

    # This definition needs to change
;
    _name.category_id             atom_type_scat
    _name.object_id               length_neutron
    _type.purpose                 Number
    _type.source                  Assigned
    _type.container               Single
    _type.contents                Real
    _enumeration.def_index_ids     
        ['_atom_type.atomic_number' '_atom_type.mass_number']
    _units.code                   femtometres

    loop_
         _enumeration_default.index_1
         _enumeration_default.index_2
         _enumeration_default.value
     1   0   -3.7390
     1   1   -3.7406
     1   2    6.671
     1   3    4.792
     2   0    3.26
     2   3    5.74
     2   4    3.26
     3   0   -1.90
     3   6    2.0
     3   7   -2.22
     #...

save_
vaitkus commented 1 year ago

The problem here is that there is a fixed number of _enumeration_default.index_{1,2,3,...} items and a variable number of elements in the _enumeration.def_index_ids array. However, we could define the _enumeration_default.index item as having the same data structure as the _enumeration.def_index_ids array, i.e.:

    _enumeration.def_index_ids     
        ['_atom_type.atomic_number' '_atom_type.mass_number']
     ...
    loop_
         _enumeration_default.index
         _enumeration_default.value
     [ 1   0 ]   -3.7390
     [ 1   1 ]   -3.7406
     [ 1   2 ]    6.671
     [ 1   3 ]    4.792
     [ 2   0 ]    3.26
     ...

Also, the mass number of "0" seems a little bit strange. I assume that it is intended to represent the scattering factor for the case where the exact mass number is not known? Maybe "?" or "." is better then?

rowlesmr commented 1 year ago

There are many problems with my example!

0 is meant to represent the natural isotopic abundance. I wrote it here, I thought it was in this thread....

Just for clarification, I think you had too many loop variables...?

    _enumeration.def_index_ids     
        ['_atom_type.atomic_number' '_atom_type.mass_number']
     ...
    loop_
         _enumeration_default.indexes #indices ?
         _enumeration_default.value
     [ 1   0 ]   -3.7390
     [ 1   1 ]   -3.7406
     [ 1   2 ]    6.671
     [ 1   3 ]    4.792
     [ 2   0 ]    3.26
     #...

or

    _enumeration.def_index_ids     
        ['_atom_type.element_symbol' '_atom_type.mass_number']
     ...
    loop_
         _enumeration_default.indexes #indices ?
         _enumeration_default.value
     [ H   0 ]   -3.7390
     [ H   1 ]   -3.7406
     [ H   2 ]    6.671
     [ H   3 ]    4.792
     [ D   0 ]    6.671
     [ He   0 ]    3.26
     #...

.

can a dataitem have two different types? ie single and list? then we could just keep _enumeration.def_index_id and _enumeration_default.index as-is, with a "simple" redefinition.

vaitkus commented 1 year ago

Yes, I forgot to change the data names in my previous example (fixed now).

rowlesmr commented 1 year ago

Also, the mass number of "0" seems a little bit strange. I assume that it is intended to represent the scattering factor for the case where the exact mass number is not known? Maybe "?" or "." is better then?

Yeah, "." is probably better. Also as the enumeration default for _atom_type.mass_number.

jamesrhester commented 1 year ago

Ok, so far that is a workable solution, even if it (slightly) offends my relational sensibilities. A relational solution is hard to attain, as it would require that the tables of default values have descriptive data names for each table column, which cannot be identical to the data name whose value they are used to determine (because it would repeat in a single save frame) and cannot be a DDLm attribute (because those descriptive data names do not describe attributes of a data name). Essentially it would require a complete rethink of the whole default value system.

The way to proceed would be to work up proper DDLm definitions for the two new attributes and then discuss them on the IUCr DDLm mailing list.

can a dataitem have two different types?

In the original concept of DDLm all sorts of combinations of types were envisioned. We've dropped them as being too complex, and more fundamentally because a different type implies a different meaning.

Anyway, because what we are doing here is restricted to dictionaries and not relevant to all those CIF files out in the world, we have more control over updating the files affected by this. My preference would simply be to define new data names as above, perhaps deprecating the current ones before there is too much DDLm software out there using them.

vaitkus commented 1 year ago

The core issue was resolved with PR #399.

@jamesrhester , @rowlesmr feel free to reopen the issue if you feel otherwise.