Linking refinements to the data sets used

jamesrhester commented 1 year ago

(Issue created so as not to forget it - this was work done as part of thinking about where CELL belongs. I don't think this is an urgent task).

Up until now, the link between a refined structural model and the data it is based on was simply by virtue of being in the same data block. In multi-block scenarios we need to explicitly link a particular model with data. So I propose:

Create refine.id to identify a refinement.
Create _refine_diffrn.refine_id and _refine_diffrn.diffrn_id: key data names of new REFINE_DIFFRN category, listing the datasets (_diffrn.id) used in the refinement e.g.
```
loop_
_refine_diffrn.refine_id
_refine_diffrn.diffrn_id
1 xray1
1 neutron1
2 xray1
2 xray2
```

loop_ _diffrn_radiation.diffrn_id _diffrn_radiation.type xray1 xray xray2 xray neutron1 neutron


describing two refinements, one which used a neutron and xray dataset and one which used two xray datasets.

Open question: how would this (if at all) interact with powder diffraction, where the data set is linked to `_diffractogram.id` rather than `_diffrn.id`.

rowlesmr commented 1 year ago

re powder, it also depends on how you define "refinement".

One diffraction pattern, one or more phases: This is one refinement.
More than one diffraction pattern, all refering to the same one or more phases: This is one refinement.
- Doesn't matter if mixing X-rays and neutrons, or lab and natl. facility.
More than one diffraction pattern, all refering to different one or more phases, with each diffraction pattern refined independently: This is many refinements.
- this is just repeating 1. (or possibly 2.) many times
More than one diffraction pattern, all refering to different one or more phases, refined parametrically: Is this one refinement?
- eg I refine a thermal expansion coefficient, and derived cell prms from this coefficient, which is refined over all diffraction patterns.

jamesrhester commented 1 year ago

This is absolutely an important task.

There's on easy answer: one refinement must include everything that contributes to the calculation of chi^2 (or the quantity being minimised). So if multiple diffraction patterns and phases are involved, then that is one refinement.

A refinement requires observations and a model. A refinement would associated with an identifier, which would come from a Set category to enforce no more than one refinement per data block, as that is the current implicit treatment in single crystal/cif_core. One might list all observations (diffractograms, constraints) contributing to the refinement in a separate REFINE_OBS loop.

A model, on the other hand, is the result of a particular refinement. Perhaps we want a separate _model.id to group structure (via #442, perhaps) and restraints into a model, and then the REFINE category simply refers to the model.

Not forgetting restraints and constraints, covered by a separate dictionary but also relevant to particular refinements.

Note these thoughts are relative to cif_core, and haven't touched on powder, for which multiple structures are often refined simultaneously.

rowlesmr commented 8 months ago

Open question: how would this (if at all) interact with powder diffraction, where the data set is linked to _diffractogram.id rather than _diffrn.id.

(coming from a powder point-of-view)

_diffrn.id is described as

Unique identifier for a diffraction data set collected under particular diffraction conditions.

It could be better described as

Unique identifier for a set of particular diffraction conditions.

You could then link _diffrn.id and _diffractogram.id through _diffractogram.diffrn_id. I think this is way would be preferred, as it keeps the powder implementation in the powder dictionary; _diffrn.diffractogram_id is starting to cross-pollinate.

Other arguments: I feel that _diffrn.id is kind of higher up the foodchain, and diffractogram should refer to it, rather than the other way around. Also, you can have many diffractograms taken at one set of experimental conditions, so this maintains the Setness of both categories.

data_diffraction_conditions
loop_ # just looping for consiseness. Pretend each row is in a different block
_diffrn.id
_diffrn.ambient_temperature
_diffrn.ambient_pressure
A 10 101.3
B 20 101.3
C 50 101.3
D 100 101.3
#...

data_pattern_1
_diffractogram.id 1
_diffractogram.diffrn_id A
#...

data_pattern_2
_diffractogram.id 2
_diffractogram.diffrn_id B
#...

data_pattern_3
_diffractogram.id 3
_diffractogram.diffrn_id B
#...

data_pattern_4
_diffractogram.id 4
_diffractogram.diffrn_id C
#...

jamesrhester commented 8 months ago

That _diffractogram.id / _diffrn.id proposal sounds good. A little surprising this isn't in the pdCIF dictionary already.

COMCIFS / cif_core

Linking refinements to the data sets used #344