COMCIFS / MultiBlock_Dictionary

Definitions describing data stored in multiple containers
1 stars 3 forks source link

Repurpose the structure category #3

Closed jamesrhester closed 3 months ago

jamesrhester commented 9 months ago

Please see discussion at https://github.com/COMCIFS/cif_core/pull/445#issue-1796323136 for background.

Original approach:

  1. All categories describing the space group are assigned a _space_group.id key data name or pointer. This makes explicit the fact that they all relate to the same space group.

    1. The STRUCTURE category now holds a description of structure, which consists of a pointer to the space group and to the experimental conditions (_diffrn.id) that it relates to.
  2. The ATOM_SITE and CELL categories now include a key data name pointer to _structure.id, identifying which structure and experimental conditions an atom or cell relates to.

  3. More controversially, _cell.diffrn_id has been replaced by _structure.diffrn_id. To strictly follow our rules, it should be deprecated, but it is unlikely to have been used since the release.

If @vaitkus and @rowlesmr could confirm that the above discussion satisfied their objections, I will recreate the pull request.

rowlesmr commented 9 months ago

I think they hit all of my concerns.

Just as long as we can differentiate between structure, specimen, and crystal, as they are sometimes used as synonyms in CORE, but mean very different things in powder.

rowlesmr commented 8 months ago

Just an observation, doesn't this new category supersede PD_PHASE?

jamesrhester commented 8 months ago

I'm going to say...that's up to us. Is there any sliver of light between "a component in a sample" and "a crystallographic structure?" I'm going to say yes, because there could be an amorphous phase. But it is certainly true that creating "_pd_phase.structure_id" would be a simple way of associating a structure with a phase.

jamesrhester commented 8 months ago

When thinking about this proposal, we should also ponder how it relates to other descriptions of structure, e.g. modulated structures and magnetic structures. The magnetic structure dictionary adds to ATOM_SITE, so under the present proposal the magnetic structure is included in the meaning of a structure. The modulated structure dictionary provides new categories describing Fourier components of modulation waves, which we essentially have to include as the magnetic dictionary uses them. So currently I'd be imagining us attaching a _structure.id child data name to the ATOM_SITE_MOMENT_FOURIER category and the analog from the modulated structures dictionary.

The modulated structure dictionary has the concept of "subsystems" for composite structures, which we'll need to study more closely to see how they interact with CELL.

rowlesmr commented 8 months ago

amorphous phase.

say no more: "_pd_phase.structure_id" it is.

rowlesmr commented 8 months ago

From https://github.com/COMCIFS/cif_core/issues/442#issuecomment-1630720938

However, as I recall _cell.diffrn_id was mainly used to describe experiments in which the cell was measured under a different set of conditions than the atomic coordinates (see the cell-measurement-multi-block.cif and cell-measurement-single-block.cif files under examples/). How would this be represented under the _structure.id model?

jamesrhester commented 8 months ago

Good point. _cell.diffrn_id refers to the diffraction conditions used to measure the cell. _cell.structure_id points to a structure, which points to a _diffrn.id that is associated with the structure. These two _diffrn.id pointers should be the same, but as we discussed before, that was not necessarily the case historically.

Therefore, _cell.diffrn_id should stay, but is not the key data name any more, replaced by _cell.structure_id. My mistake for suggesting it should go.