COMCIFS / cif_core

The IUCr CIF core dictionary
15 stars 9 forks source link

Our handling of multiple wavelengths is broken #505

Open jamesrhester opened 1 week ago

jamesrhester commented 1 week ago

Our dictionaries currently cope poorly with multiple wavelengths in the incident beam.

Currently wavelengths are specified in the DIFFRN_RADIATION_WAVELENGTH category, which is a loop keyed by _diffrn_radiation_wavelength.id. For synchrotrons or neutrons, this loop will typically have a single value, and for X-ray tube sources this loop would typically have a pair of values, or even three if the K-beta line is present. The wavelength identifier may be used in DIFFRN_REFLN to indicate which wavelength value a peak corresponds to. For a Laue experiment there could be a wavelength for every observed spot. Even for synchrotrons and neutrons, harmonic components may be present which, while typically modelled as distinct phases, are actually just extra wavelengths.

The problem: cannot associate a collection of wavelengths with anything

Category DIFFRN_RADIATION has a pointer to _diffrn_radiation_wavelength.id allowing a wavelength to be formally associated with experimental conditions. However, it is impossible to associate more than one wavelength with a single experiment in this way, ruling out using it for X-ray tube experiments.

Similar issues arise for any other category that refers to wavelength using a pointer to _diffrn_radiation_wavelength.id, with the exception of the REFLN type categories.

Solution

_diffrn_radiation.wavelength_id is deprecated, together with similar pointers, and instead _diffrn_radiation_wavelength.diffrn_id is added to DIFFRN_RADIATION_WAVELENGTH. This associates a wavelength with every experiment, which may be calculated as the weighted sum when more than wavelength is present in the beam.

If everybody is happy with this solution, I'll put together a PR.

vaitkus commented 6 days ago

It would be interesting to see the suggested PR as it would automatically answer most of the questions I have (e.g. would the category keys change, etc.).

A cursory look at the COD revealed that the _*_wavelength_id data items are not widely used (900 entries or so out of 500 000) with the exception of _pd_refln_wavelength_id which is used quite extensively (50 000 entries or so). However, this is probably a non-issue since the PD dictionary is being reworked quite extensively.

rowlesmr commented 6 days ago

_diffrn_radiation_wavelength.diffrn_id is added to DIFFRN_RADIATION_WAVELENGTH. This associates a wavelength with every experiment, which may be calculated as the weighted sum when more than wavelength is present in the beam.

I have a query about this.

You say a wavelength with each experiment. How does that gel with a standard X-ray tube powder diffraction pattern? I model my Co-tube data with 8 different wavelengths, and use the one with the largest area as "the" wavelength for the purposes of d-spacing calculations.

jamesrhester commented 5 days ago

Forgive the imprecise language. The exact reason for raising this issue is the situation to which you refer. I should have said 'the wavelength associated with the experiment will be the weighted sum of the wavelengths listed for that _diffrn.id in diffrn_radiation_wavelength.

_diffrn.id Q
_diffrn_radiation.probe x-ray
_diffrn_radiation.diffrn_id  Q

loop_
      _diffrn_radiation_wavelength.diffrn_id
      _diffrn_radiation_wavelength.id
      _diffrn_radiation_wavelength.value
     Q 1   1.5405
     Q 2   1.5443

In the above example I've been explicit about including the value of _diffrn.id child data names, but they could be dropped if a separate _diffrn.id goes in a separate block.

jamesrhester commented 5 days ago

It would be interesting to see the suggested PR as it would automatically answer most of the questions I have (e.g. would the category keys change, etc.).

I agree, I will make one.

A cursory look at the COD revealed that the _*_wavelength_id data items are not widely used (900 entries or so out of 500 000) with the exception of _pd_refln_wavelength_id which is used quite extensively (50 000 entries or so). However, this is probably a non-issue since the PD dictionary is being reworked quite extensively.

It is also a non-issue as identifying individual peaks by the wavelength that produced them would not change and is the part of the current system that actually works.

rowlesmr commented 23 hours ago

'the wavelength associated with the experiment will be the weighted sum of the wavelengths listed for that _diffrn.id in diffrn_radiation_wavelength.

That may not work in general. Consider the following:

_diffrn.id UUID
_diffrn_radiation.probe x-ray
_diffrn_radiation.diffrn_id  UUID

loop_
      _diffrn_radiation_wavelength.diffrn_id
      _diffrn_radiation_wavelength.id
      _diffrn_radiation_wavelength.value
      _diffrn_radiation_wavelength.wt
     UUID 1   1.534753   0.0159  'Cuka5
     UUID 2   1.540596   0.5791  'Cuka5 
     UUID 3   1.541058   0.0762  'Cuka5
     UUID 4   1.54441    0.2417  'Cuka5 
     UUID 5   1.544721   0.0871  'Cuka5 
     UUID 6   0.7693222  0.02    'Cu ka half-wavelength
     UUID 7   1.4763742  0.00494 'W La
     UUID 8   1.3922160  0.00340 'Cu kB

The mean will be skewed by the presence of impurity wavelengths. They're necessary to correctly model sprectral artefacts, but don't really contribute to anything else.

jamesrhester commented 22 hours ago

Please consider pull requests 18 and 19 for the multi-block dictionary as two alternative solutions to the above problem. Note that I prefer 19, looking forward to the time when we have 100 temperature measurements using a single source and avoiding the repetition of the source characteristics.

jamesrhester commented 22 hours ago

The mean will be skewed by the presence of impurity wavelengths. They're necessary to correctly model sprectral artefacts, but don't really contribute to anything else.

The two alternatives that I've just put forward don't specify how to deal with multiple wavelengths, they just give a way to provide the information. Down the track we will need to ponder which wavelength things like scattering factors and d-spacings use.

rowlesmr commented 22 hours ago

Please consider pull requests 18 and 19 for the multi-block dictionary as two alternative solutions to the above problem. Note that I prefer 19, looking forward to the time when we have 100 temperature measurements using a single source and avoiding the repetition of the source characteristics.

Just adding links

https://github.com/COMCIFS/MultiBlock_Dictionary/pull/18 https://github.com/COMCIFS/MultiBlock_Dictionary/pull/19

rowlesmr commented 22 hours ago

Down the track we will need to ponder which wavelength things like scattering factors and d-spacings use.

AFAIK, TOPAS uses the wavelength with the largest area as "the" wavelength.