COMCIFS / cif_core

The IUCr CIF core dictionary
15 stars 9 forks source link

Choose a shared `_dictionary.namespace` for all IUCr dictionaries #292

Open vaitkus opened 2 years ago

vaitkus commented 2 years ago

During some other discussion it was suggested that all IUCr dictionaries should share the same namespace (see the _dictionary.namespace data item). Based on this, the TOPOLOGY_CIF dictionary [1] was assigned the same namespace as the CIF_CORE dictionary (CifCore) with the possibility that the namespaces of both of these dictionaries could be changed in the future.

Two question remain to be answered:

[1] https://github.com/COMCIFS/TopoCif/blob/master/Topology.dic

vaitkus commented 7 months ago

@nautolycus mentioned that Chapter 2.4 in the upcoming new version of Volume G states the following:

All DDLm dictionaries managed by the IUCr should have an identical
value for this attribute [namely _dictionary.namespace], as data
names are guaranteed to be unique within the IUCr domain and there
is no need for disambiguation.

This seems great and fits well with the approach we used for Topology.dic. However, does it really always hold true?

Some data items are (re)defined in more than one dictionary and could thus, in theory, have different attribute values (e.g. different dREL code snippets, different enumeration ranges). For example, data item _atom_site_Fourier_wave_vector.q1_coeff is defined both in the modulation dictionary [1] and in the magnetic dictionary [2].

@jamesrhester do you foresee any situations where this might cause a need for separate namespaces?

[1] https://github.com/COMCIFS/Modulated_Structures/blob/10e64e993304b793929e3bb82ce2eaa620b021a6/cif_ms.dic#L2356 [2] https://github.com/COMCIFS/magnetic_dic/blob/bd9ebdc1f71b29118c34db69052f01e0b36c36a0/cif_mag.dic#L111

jamesrhester commented 7 months ago

@jamesrhester do you foresee any situations where this might cause a need for separate namespaces?

No, we should definitely strive for all data names to have stable meanings within a single namespace. "Stability" here means their "downstream" meaning: that is, the way in which those data names are interpreted and used in further calculations should not change. The way in which a data name is derived can change, for example, the observed structure factor from a powder diffraction refinement is worked out by apportioning intensity in overlapping peaks based on how the calculated intensities contribute, whereas in single crystal work F_obs comes from adding up pixels in a peak (or other approach). However, calculation of difference density uses F_obs from any source interchangeably.

Likewise, F_calc can arise from many different models but it should be possible to subtract it from F_obs regardless of which model was used.

So an importing dictionary can change the dREL (because that is how something is derived) but should not change the enumeration ranges or enumeration choices. _atom_site_Fourier_wave_vector.q1_coeff should have compatible meanings in both dictionaries, if not, we should harmonise them.

Additionally, no popular CIF software has any mechanism to differentiate identical data names from different namespaces so in practice we couldn't announce different namespaces for dictionaries even if we wanted to. The idea of the namespace attribute (now, at least) is for when other organisations want to use CIF for their own purposes and don't want to coordinate with IUCr.

So definitely we should proceed with @nautolycus initiative to bring the namespaces into agreement.

vaitkus commented 7 months ago

@jamesrhester Great, thank you for clarifying this.

Additionally, no popular CIF software has any mechanism to differentiate identical data names from different namespaces so in practice we couldn't announce different namespaces for dictionaries even if we wanted to. The idea of the namespace attribute (now, at least) is for when other organisations want to use CIF for their own purposes and don't want to coordinate with IUCr.

Theoretically, names from different dictionaries could be distinguished by analysing the AUDIT_CONFORM category, but this does not detract from your main point.

One question still remains -- should the DDLm reference dictionary also be put in the CifCore namespace or remain in the DdlDic namespace?