Need for _refln.id items?

gmadaria commented 2 months ago

I would like to propose _refln.id tags as category keys in those categories that affect reflections. For example, using Miller indices is clearly insufficient DIFFRN_REFLN, since measurements are currently highly redundant. Even after data reduction, a structure with domains could repeat triplets of indices in the same loop. In the case of modulated structures, all the Miller indices of each reflection, in a variable number depending on the dimension of the modulation, would be needed so the assignment of category keys would be problematic. There is a solution proposed in the new version of the ms_cif.dic dictionary that consists of using diffrn_refln.index_mlist or refln.index_mlist as category keys, so that the additional Miller indices would be a single object, a list. However, the solution is not satisfactory, since it does not eliminate the twin problem and makes life a little more complicated for developers, since the loops would have structures like:: h k l m1... one-dimensional case 1 2 3 [-1] ... 1 2 3 [2] ...

h k l m1 m2 m3... three-dimensional case 1 2 3 [4,5,6] 1 2 3 [-1,2-3]

It is true that twin_individual could be also added as a category key, but wouldn't it be simpler to introduce a numerical and correlative _refln.id?

jamesrhester commented 2 months ago

I agree that _refln.id is needed. A basic principle of designing relational schemes is not to use natural keys (keys which have intrinsic meaning beyond simple identification), because eventually you encounter a situation where they are no longer able to serve duty as both unique identifiers and retain their natural meaning.

In the previous case of PUBL_AUTHOR and related author loops, where we introduced an author identifier, software that expects author names to be unique will continue to function in those situations where author names are unique, and fail (as it would have anyway) where author names are not unique.

For, e.g. DIFFRN_REFLN, mmcif has already defined _diffrn_refln.id so creating that should be uncontroversial. mmCIF have not created the corresponding _refln.id, but I think the same arguments hold as for AUDIT_AUTHOR: software would fail anyway when hkl are repeated in the reflection loop for advanced uses such as msCIF, so there is no loss in switching the key to _refln.id.

gmadaria commented 2 months ago

In the case of modulated structures, extending the number of indices in reflections is very easy if a refln.id or similar exists. For example, extending TWIN_REFLN is trivial because the concatenation of the dictionaries cif_ms.dic and cif_twin.dic does not touch the category keys (_twin_refln.datum_id and _twin_refln.individual_id). This is not the case for DIFFRN_REFLN (and related categories). Extending the category keys from h,k,l to h,k,l,m1,...,mp has serious implications. For example, the order in which dictionaries are concatenated is important, and in the case where the CIF file contains both crystalline and modulated structures (which is quite common), the merged dictionary will invalidate reflection loops of either type. A possible alternative would be to redefine the Miller indices into separate categories, leading to h,k,l indices for crystalline or modulated structures. In any case, the use of Miller indices does not result in adequate category keys, since the measurements are usually redundant, with each reflection being measured several times.

jamesrhester commented 1 day ago

Please examine and comment on #506

COMCIFS / cif_core

Need for _refln.id items? #499