Open rowlesmr opened 8 months ago
All of the REFLN
-type categories should be keyed by a separate id
type, as we cannot guarantee that hkl
are unique. This is a real problem for modulated structures (hklmnop...) and raw data (same peak collected more than once). I've been putting this off, but needs to be discussed and done.
chemical_conn_bond
et al: the references to id
are leftovers from when there were such identifiers in an earlier draft. May be deleted.As the the more general question of when to create such "synthetic" identifiers, there is no clear-cut answer. The original DDLm vision always had a single id
for every Loop
category, to make dREL of the form category[keyval]
resolve. We've expanded the dREL rules so that multi-key-data-name categories will still resolve economically.
I think the practical answer is that if rows in a category will be linked to from other categories, then to avoid data name proliferation a synthetic identifier is worth creating. So, for example, the topology dictionary needs to identify nodes that are joined into a net, where a node might need an atomic label, symmetry operation id, and three lattice translations in order to identify it. The loop listing the nodes in a particular net could either refer to a synthetic node_id
, or use five child data names of the above items to refer to a node - so, clearly creating a node_id
is worthwhile.
The hkl
problem is a little different - the issue here is not data name proliferation, but that items with a physical meaning are used as identifiers, opening us up to possible duplication (ie not a key any more) when the science develops. The three lattice translations used to identify a node in the previous paragraph are also bad in this sense, as modulated structures need to specify lattice translations in a different way. Hmm.
I've been looking at category keys for various reasons, and have happened upon some questions:
DIFFRN_REFLN
: Keyed on_diffrn_refln.hkl
, which is aMatrix
of hkl values. Other categories (egDIFFRN_ORIENT_REFLN
) are keyed on the three indicies individually.CHEMICAL_CONN_BOND
: Keyed on.atom_1
and.atom_2
, but also hasid
as a "Unique identifier for the bond.". The.id
dataname isn't referred to anywhere else in core. The same withGEOM_ANGLE
,GEOM_BOND
,GEOM_CONTACT
,GEOM_HBOND
,GEOM_TORSION
, andMODEL_SITE
. Some of these are understandable, as there are many key datanames (looking at youGEOM_TORSION
).