COMCIFS / Powder_Dictionary

CIF definitions for powder diffraction
4 stars 4 forks source link

recording internal/external standard QPA #46

Closed rowlesmr closed 1 year ago

rowlesmr commented 1 year ago

A first pass at adding the capability to record quantiative phases analysis by internal or external standard approaches.

See #12

jamesrhester commented 1 year ago

My knowledge of use of internal/external standards is insufficient to really judge this. I think a few sentences of summary of the meaning and use of internal/external standards would be good in the description of the PD_CALIB category, perhaps simply via references. Also, the definitions assume a single external/internal standard, is that a reasonable assumption? I know it is an assumption of the original dictionary, just wondering if it is fair?

rowlesmr commented 1 year ago

My knowledge of use of internal/external standards is insufficient to really judge this. I think a few sentences of summary of the meaning and use of internal/external standards would be good in the description of the PD_CALIB category, perhaps simply via references. Also, the definitions assume a single external/internal standard, is that a reasonable assumption? I know it is an assumption of the original dictionary, just wondering if it is fair?

Will add some words. A single internal/external std is a very reasonable assumption; I don't think you could make it work with multiple stds.

rowlesmr commented 1 year ago

I'm trying to figure out how to do less data duplication.

If I'm reporting several diffractograms in the same container, all calibrated against the same external standard dataset, it would be good just to have one value of _pd_calib_std.external_k_factor to which they can all point.

Is it legitimate to state in the description that it is preferred that _pd_calib_std.external_k_factor should only be used in the data block containing the external standard diffractogram? The diffractogram which is being calibrated would contain _pd_calib_std.external_block_id*, which links the unknown specimen to the standard diffractogram. The presence of _pd_calib_std.external_block_id means that the values of _pd_phase_mass.percent reported for this diffractogram are on an absolute basis, and go to that block id if you want the calibration value.

If the calibration data are not available, and all you have is the value of K, then just use _pd_calib_std.external_block_id in the unknown sample data block. There won't be a _pd_calib_std.external_block_id link.

This means you'd write something like:

# Preferred

###
# Standard information
###

data_theStd
_pd_block.id                std
_pd_block_diffractogram.id  stdDiffPatt
_pd_phase.name              "NIST SRM676a Al2O3"
#crystal structure information

data_theStandardDiffractionPattern
_pd_block.id stdDiffPatt

loop_
_pd_phase.block_id
_pd_phase.mass_percent
std  99.02

_pd_calib_std.external_k_factor     456.789
_pd_char.mass_atten_coef_mu_calc       123
_pd_char.special_details
;mass_atten_coef calculated from crystal structure
assuming 100% density
;

loop_
_pd_meas.2theta_scan
_pd_meas.counts_total
_pd_calc.intensity_net
5.00    4521    4524.212
5.10    4624    4524.212
#...

###
# Unknown information
###

data_thePyritePhase
_pd_block.id pyrite
_pd_block_diffractogram.id diffpat
#crystal structure information

data_theAlbitePhase
_pd_block.id albite
_pd_block_diffractogram.id diffpat
#crystal structure information

data_aDiffractionPattern
_pd_block.id diffpat

loop_
_pd_phase.block_id
_pd_phase.mass_percent
pyrite  10.54(12)  
albite  40.75(12)

_pd_calib_std.external_name     "NIST SRM676a Al2O3"
_pd_calib_std.external_block_id     stdDiffPatt
_pd_char.mass_atten_coef_mu_calc    157
_pd_char.special_details
;the specimen mass_atten_coef calculated from elemental
analysis by XRF.
;

loop_
_pd_meas.2theta_scan
_pd_meas.counts_total
_pd_calc.intensity_net
5.00    1231    1024.212
5.10    1254    1024.212
#...
# Otherwise

###
# Unknown information
###

data_thePyritePhase
_pd_block.id pyrite
_pd_block_diffractogram.id diffpat
#crystal structure information

data_theAlbitePhase
_pd_block.id albite
_pd_block_diffractogram.id diffpat
#crystal structure information

data_aDiffractionPattern
_pd_block.id diffpat

loop_
_pd_phase.block_id
_pd_phase.mass_percent
pyrite  10.54(12)  
albite  40.75(12)

_pd_calib_std.external_name        "NIST SRM676a Al2O3"
_pd_calib_std.external_k_factor    456.789
_pd_char.mass_atten_coef_mu_calc   157
_pd_char.special_details
;the specimen mass_atten_coef calculated from elemental
analysis by XRF.
;

loop_
_pd_meas.2theta_scan
_pd_meas.counts_total
_pd_calc.intensity_net
5.00    1231    1024.212
5.10    1254    1024.212
#...
jamesrhester commented 1 year ago

Is it legitimate to state in the description that it is preferred that _pd_calib_std.external_k_factor should only be used in the data block containing the external standard diffractogram?

If _pd_calib_std.external_k_factor could only have a single value for a complete data set, then it doesn't matter which data block it appears in and the "powder dictionary style guide" could certainly recommend doing this.

I'm more concerned about how we fix the whole calibration story in pdCIF. pd_calib_std is conceived as a catch-all list of all calibration datasets relevant to the diffractograms collected together in the data set. However, no attempt is made to describe in a machine-readable way the function of each data set and what it calibrates (angle/intensity/k-factor/etc.) so humans have to intervene to figure out what is going on. I think we can do better. Anyway, from this point of view the K-factor is better as a data name outside pd_calib_std, just as the calibrated angles and wavelengths appear elsewhere.

Regarding doing better, we have two options that I can see:

  1. A new _pd_calib_std.type data name that identifies the type of calibration, taking values like angle/intensity/k-factor/wavelength. The definitions for each alternative value would describe how to use the relevant diffractogram (or 2D image) to perform the calibration. It should then be notionally possible to write a dREL method that can use the appropriate calibration data set to e.g. correct raw intensity.
  2. For each type of calibration define a new category that allows us to provide parameters for that calibration and allowing for different approaches to the same type of calibration. The current pd_calib_std becomes surplus.

I prefer option (2) as it offers more flexibility. It would be interesting to see what a pd_calib_std_K category would look like - it would have to have detector_id and phase_id data names, which would be sufficient to find the diffractogram, but what else would be needed to reproduce the calculation?

rowlesmr commented 1 year ago

Is it legitimate to state in the description that it is preferred that _pd_calib_std.external_k_factor should only be used in the data block containing the external standard diffractogram?

If _pd_calib_std.external_k_factor could only have a single value for a complete data set, then it doesn't matter which data block it appears in and the "powder dictionary style guide" could certainly recommend doing this.

I could forsee a dataset where every diffraction pattern has it's own K; I've done it this year when calibrating a secondary standard.

rowlesmr commented 1 year ago

I prefer option (2) as it offers more flexibility.

Me too.

For what do we use standard datasets? To calibrate angles and intensities.

Why do we calibrate angles? In my mind, to determine wavelength, or when used as an internal standard, to get precise cell edges for the other phases in the specimen.

Why do we calibrate intensities? Probably most likely to measure incident flux (probably in order to create a _pd_proc.intensity_total), or to calibrate external standard QPA. Did we also want the ability to record an RIR?

How fine-grained did you want to make the categories?

rowlesmr commented 1 year ago

I prefer option (2) as it offers more flexibility. It would be interesting to see what a pd_calib_std_K category would look like - it would have to have detector_id and phase_id data names, which would be sufficient to find the diffractogram, but what else would be needed to reproduce the calculation?

w_a = (sZMV)_a * MAC_specimen / K

To get the absolute weight fraction, you need to know the scale factor, unit cell mass, and unit cell volume for a phases, the MAC of the entire unknown specimen, and the diffractometer constant, K.

There already exists _cell.atomic_mass, _cell.volume, _pd_phase.mass_percent and _pd_char.mass_atten_coef_mu_calc, or _pd_char.mass_atten_coef_mu_obs. There isn't a way to record the scale factor, and previous discussion has shed some doubt on it's general applicability; ie a scale factor computed by GSAS may be different to that by TOPAS dues to differences in how constants are allowed for. Relative values would be the same, absolute, not necessarily.

You then just need K and a way to specify that a particular diffraction pattern is using a particular K from somewhere else (or just to list it in the data block), or that a particular diffraction pattern is the source of the value K.

.

PD_CALIB_EXT_STD would consist of:

In the data block of the external standard, to record all of the pertinant information:

data_extstd
    _pd_block.id    the_std
    _pd_phase.id    1
    _pd_diffractogram.id    dp1

    _pd_phase.name  SRM676a

    ###
    # unit cell prms, atom positions, and other stuff go here
    ### 

    _cell.atomic_mass               611.768
    _cell.volume                    259.861
    _pd_char.mass_atten_coef_mu_calc        31.5908044  
    _pd_phase.mass_percent              99.02  # this value is defined in the SRM documentation. If it isn't present, assume 100

    # we don't have scale factor in  pdCIF. 
    # It would depend on both phase and diffractogram _and_ analysis program, 
    # and potentially versions of the program, or even which macros/modifications
    # you were using
    # _pd_phase.scale_factor            0.0051354   

    # k_factor = scale_factor * atomic_mass * volume * mass_atten_coef_mu_calc / mass_percent
    _pd_calib_ext_std.k_factor          271.548

    loop_
        _pd_meas.2theta_scan
        _pd_proc.intensity_total
        _pd_proc.ls_weight
        _pd_calc.intensity_total
        _pd_proc.intensity_bkg_calc
        5.001000    43.364000     0.040297    25.994961    25.994961 
        # etc   

and unknown would be

data_unknown_1      
    _pd_block.id    the_first_unknown
    _pd_phase.id    2
    _pd_diffractogram.id    dp2

    _pd_phase.name  "The strange white powder"

    ###
    # unit cell prms and other stuff go here
    ### 

    _cell.atomic_mass           3612.958
    _cell.volume                1192.592
    _pd_char.mass_atten_coef_mu_calc    99.8489722  
    _pd_phase.mass_percent          87.98

    #In this instance, you only need one of the following three lines:
    _pd_calib_ext_std.phase_id      1
    #_pd_calib_ext_std.diffractogram_id dp1
    #_pd_calib_ext_std.block_id     the_std

    # we don't have scale factor in  pdCIF. 
    # It would depend on both phase and diffractogram _and_ analysis program, 
    # and potentially versions of the program, or even which macros/modifications
    # you were using
    # _pd_phase.scale_factor        0.00002025034   

    loop_
        _pd_meas.2theta_scan
        _pd_proc.intensity_total
        _pd_proc.ls_weight
        _pd_calc.intensity_total
        _pd_proc.intensity_bkg_calc
        5.001000    43.364000     0.040297    25.994961    25.994961 
        # etc   
rowlesmr commented 1 year ago

it would have to have detector_id and phase_id data names

detector_id or diffractogram_id?

rowlesmr commented 1 year ago

How about this?

I think once this is fleshed out, it will deprecate PD_CALIB, PD_CALIB_STD, and potentially PD_CALIB_OFFSET

QPA

Is it possible to force contraindicators? ie you can only have one of the following three categories? I don't think it makes sense to use more than one QPA method on a single diffractogram.

Anyhoo:

External standard

PD_QPA_EXT_STD is a Set category for QPA by external standard. Its Set as it only makes sense to have a single external standard when calibrating QPA of a diffractogram.

The category consists of:

Internal standard

PD_QPA_INT_STD is a Set category for QPA by internal standard. Its Set as it only makes sense to have a single internal standard when calibrating QPA of a diffractogram.

The category consists of:

edit this next section is a bad idea.

Combined internal/external standard

PD_QPA is a Set category for QPA by internal/external standard. Its Set as it only makes sense to have a single standard when calibrating QPA of a diffractogram.

This would replace PD_QPA_EXT_STD and PD_QPA_INT_STD. It's also a little presumptuous in it's naming, implying you can only do QPA by in/external std. I don't think I like this category; it grates a little on me.

The category consists of:

RIR

PD_QPA_RIR is a Set category for QPA by reference intensity ratio (also potentially calculating RIRs?). Its Set as a phase can (should?) only have a single RIR value. We will need another category to allow RIR values to be looped when reported in a diffractogram block. Or do we just make this a Loop category?

The category consists of:

Intensity

PD_CALIB_INTENSITY is a Loop category for intensity scaling. Its Loop as you need to be able to loop over all detectors. It's keyed on _pd_calib_intensity.id and _pd_calib_intensity.detector_id.

The category consists of:

Wavelength

PD_CALIB_WAVELENGTH is a Loop category for denoting the reference material from which the wavelength was determined (see also _diffrn_radiation_wavelength_determination). Its Loop as you may have multiple phases and diffractograms used in the calibration. It's keyed on _pd_calib_wavelength.id.

The category consists of:

Angle / x-ordinate

But how do we deal with angle? There currently exists PD_CALIB_OFFSET, but this can only record a constant offset of 2Th; you can't record TOF, energy, position etc offsets, or offsets that vary with x-ordinate (eg specimen displacement)

PD_CALIB_ANGLE (is a horrible name, as it should also work for TOF, ED, ...) is a Loop category for for angle calibration. Its Loop as you need to be able to loop over all detectors. It's keyed on _pd_calib_angle.id and _pd_calib_angle.detector_id.

The category consists of:

jamesrhester commented 1 year ago

The non-QPA parts of the comment at https://github.com/COMCIFS/Powder_Dictionary/pull/46#issuecomment-1367387642 should be a new issue along the lines of "Improve calibration metadata in pdCIF files". Just to keep this pull request discussion focused.

jamesrhester commented 1 year ago

it would have to have detector_id and phase_id data names

detector_id or diffractogram_id?

Well for a given detector and standard I'm assuming there could only be one diffractogram, so any 2 out of the three are sufficient. Detector and standard material strike me as more fundamental, that's all.

jamesrhester commented 1 year ago

Regarding the pd_qpa category described above, I also don't like it as the external and internal standards are fundamentally different in that the external standard characterises a detector, whereas the internal method is notionally at least detector independent. So I'd keep the external and internal categories separate.

I don't understand the comment above about the RIR phases needing a loop. As I understand it a particular compound is chosen as the RIR reference, the RIR value is looked up, and then the rest of the phases can be quantified using their own tabulated RIR values?

rowlesmr commented 1 year ago

Regarding the pd_qpa category described above, I also don't like it as the external and internal standards are fundamentally different in that the external standard characterises a detector, whereas the internal method is notionally at least detector independent. So I'd keep the external and internal categories separate.

That's why it felt icky. Also need to add a detector_id for external std.

I don't understand the comment above about the RIR phases needing a loop. As I understand it a particular compound is chosen as the RIR reference, the RIR value is looked up, and then the rest of the phases can be quantified using their own tabulated RIR values?

I don't know what I meant as well. RIR is a phase property, and you only need an RIR per phase in a mixture and you can quant it - it's "standardless".

rowlesmr commented 1 year ago

it would have to have detector_id and phase_id data names

detector_id or diffractogram_id?

Well for a given detector and standard I'm assuming there could only be one diffractogram, so any 2 out of the three are sufficient. Detector and standard material strike me as more fundamental, that's all.

I don't know. Can't you then just change out all diffractogram_ids for detector_ids? You then need to make all categories loopable to loop over multiple detectors, even if the end product is a single diffractogram...

I've put in detector_id into the external std for the moment.

rowlesmr commented 1 year ago

I've created PD_QPA_EXT_STD and PD_QPA_INT_STD to hold all the data items associated with ext/int stds, and put back all the changes I made to PD_CALIB and PD_CALIB_STD.

External standard

PD_QPA_EXT_STD is a Set category for QPA by external standard. Its Loop as it may need to loop over more than one detector id. Otherwise, it should be treated as a Set category, as it only makes sense to have a single external standard when calibrating QPA of a diffractogram. The loop keys are _pd_qpa_ext_std.detector_id and _pd_qpa_ext_std.block_id.

The category consists of:

Internal standard

PD_QPA_INT_STD is a Set category for QPA by internal standard. Its Set as it only makes sense to have a single internal standard when calibrating QPA of a diffractogram.

The category consists of:

rowlesmr commented 1 year ago

RIR

PD_QPA_RIR is a Set category for QPA by reference intensity ratio (also potentially calculating RIRs?). Its Set as a phase can (should?) only have a single RIR value.

The category consists of:

jamesrhester commented 1 year ago

Ideally this would be 3 different pull requests for external, internal and RIR as there is a lot to deal with.

I've looked through the definitions and can't see any obvious issues so have merged.

I believe the next step is to write up the way in which QPA results should be expressed in a pdCIF file in a way that is suitable for review by a QPA expert, and for inclusion in Vol G chapter on powder diffraction. The process of writing this up may expose any missing information as well and hopefully generate some examples.