recording internal/external standard QPA

A first pass at adding the capability to record quantiative phases analysis by internal or external standard approaches.

See #12

My knowledge of use of internal/external standards is insufficient to really judge this. I think a few sentences of summary of the meaning and use of internal/external standards would be good in the description of the PD_CALIB category, perhaps simply via references. Also, the definitions assume a single external/internal standard, is that a reasonable assumption? I know it is an assumption of the original dictionary, just wondering if it is fair?

My knowledge of use of internal/external standards is insufficient to really judge this. I think a few sentences of summary of the meaning and use of internal/external standards would be good in the description of the PD_CALIB category, perhaps simply via references. Also, the definitions assume a single external/internal standard, is that a reasonable assumption? I know it is an assumption of the original dictionary, just wondering if it is fair?

Will add some words. A single internal/external std is a very reasonable assumption; I don't think you could make it work with multiple stds.

I'm trying to figure out how to do less data duplication.

If I'm reporting several diffractograms in the same container, all calibrated against the same external standard dataset, it would be good just to have one value of _pd_calib_std.external_k_factor to which they can all point.

Is it legitimate to state in the description that it is preferred that _pd_calib_std.external_k_factor should only be used in the data block containing the external standard diffractogram? The diffractogram which is being calibrated would contain _pd_calib_std.external_block_id*, which links the unknown specimen to the standard diffractogram. The presence of _pd_calib_std.external_block_id means that the values of _pd_phase_mass.percent reported for this diffractogram are on an absolute basis, and go to that block id if you want the calibration value.

If the calibration data are not available, and all you have is the value of K, then just use _pd_calib_std.external_block_id in the unknown sample data block. There won't be a _pd_calib_std.external_block_id link.

This means you'd write something like:

# Preferred

###
# Standard information
###

data_theStd
_pd_block.id                std
_pd_block_diffractogram.id  stdDiffPatt
_pd_phase.name              "NIST SRM676a Al2O3"
#crystal structure information

data_theStandardDiffractionPattern
_pd_block.id stdDiffPatt

loop_
_pd_phase.block_id
_pd_phase.mass_percent
std  99.02

_pd_calib_std.external_k_factor     456.789
_pd_char.mass_atten_coef_mu_calc       123
_pd_char.special_details
;mass_atten_coef calculated from crystal structure
assuming 100% density
;

loop_
_pd_meas.2theta_scan
_pd_meas.counts_total
_pd_calc.intensity_net
5.00    4521    4524.212
5.10    4624    4524.212
#...

###
# Unknown information
###

data_thePyritePhase
_pd_block.id pyrite
_pd_block_diffractogram.id diffpat
#crystal structure information

data_theAlbitePhase
_pd_block.id albite
_pd_block_diffractogram.id diffpat
#crystal structure information

data_aDiffractionPattern
_pd_block.id diffpat

loop_
_pd_phase.block_id
_pd_phase.mass_percent
pyrite  10.54(12)  
albite  40.75(12)

_pd_calib_std.external_name     "NIST SRM676a Al2O3"
_pd_calib_std.external_block_id     stdDiffPatt
_pd_char.mass_atten_coef_mu_calc    157
_pd_char.special_details
;the specimen mass_atten_coef calculated from elemental
analysis by XRF.
;

loop_
_pd_meas.2theta_scan
_pd_meas.counts_total
_pd_calc.intensity_net
5.00    1231    1024.212
5.10    1254    1024.212
#...

# Otherwise

###
# Unknown information
###

data_thePyritePhase
_pd_block.id pyrite
_pd_block_diffractogram.id diffpat
#crystal structure information

data_theAlbitePhase
_pd_block.id albite
_pd_block_diffractogram.id diffpat
#crystal structure information

data_aDiffractionPattern
_pd_block.id diffpat

loop_
_pd_phase.block_id
_pd_phase.mass_percent
pyrite  10.54(12)  
albite  40.75(12)

_pd_calib_std.external_name        "NIST SRM676a Al2O3"
_pd_calib_std.external_k_factor    456.789
_pd_char.mass_atten_coef_mu_calc   157
_pd_char.special_details
;the specimen mass_atten_coef calculated from elemental
analysis by XRF.
;

loop_
_pd_meas.2theta_scan
_pd_meas.counts_total
_pd_calc.intensity_net
5.00    1231    1024.212
5.10    1254    1024.212
#...

and/or _pd_calib_std.external_diffractogram_id, once that gets in.

Is it legitimate to state in the description that it is preferred that _pd_calib_std.external_k_factor should only be used in the data block containing the external standard diffractogram?

If _pd_calib_std.external_k_factor could only have a single value for a complete data set, then it doesn't matter which data block it appears in and the "powder dictionary style guide" could certainly recommend doing this.

I'm more concerned about how we fix the whole calibration story in pdCIF. pd_calib_std is conceived as a catch-all list of all calibration datasets relevant to the diffractograms collected together in the data set. However, no attempt is made to describe in a machine-readable way the function of each data set and what it calibrates (angle/intensity/k-factor/etc.) so humans have to intervene to figure out what is going on. I think we can do better. Anyway, from this point of view the K-factor is better as a data name outside pd_calib_std, just as the calibrated angles and wavelengths appear elsewhere.

Regarding doing better, we have two options that I can see:

A new _pd_calib_std.type data name that identifies the type of calibration, taking values like angle/intensity/k-factor/wavelength. The definitions for each alternative value would describe how to use the relevant diffractogram (or 2D image) to perform the calibration. It should then be notionally possible to write a dREL method that can use the appropriate calibration data set to e.g. correct raw intensity.
For each type of calibration define a new category that allows us to provide parameters for that calibration and allowing for different approaches to the same type of calibration. The current pd_calib_std becomes surplus.

I prefer option (2) as it offers more flexibility. It would be interesting to see what a pd_calib_std_K category would look like - it would have to have detector_id and phase_id data names, which would be sufficient to find the diffractogram, but what else would be needed to reproduce the calculation?

Is it legitimate to state in the description that it is preferred that _pd_calib_std.external_k_factor should only be used in the data block containing the external standard diffractogram?

If _pd_calib_std.external_k_factor could only have a single value for a complete data set, then it doesn't matter which data block it appears in and the "powder dictionary style guide" could certainly recommend doing this.

I could forsee a dataset where every diffraction pattern has it's own K; I've done it this year when calibrating a secondary standard.

I prefer option (2) as it offers more flexibility.

Me too.

For what do we use standard datasets? To calibrate angles and intensities.

Why do we calibrate angles? In my mind, to determine wavelength, or when used as an internal standard, to get precise cell edges for the other phases in the specimen.

Why do we calibrate intensities? Probably most likely to measure incident flux (probably in order to create a _pd_proc.intensity_total), or to calibrate external standard QPA. Did we also want the ability to record an RIR?

How fine-grained did you want to make the categories?

I prefer option (2) as it offers more flexibility. It would be interesting to see what a pd_calib_std_K category would look like - it would have to have detector_id and phase_id data names, which would be sufficient to find the diffractogram, but what else would be needed to reproduce the calculation?

w_a = (sZMV)_a * MAC_specimen / K

To get the absolute weight fraction, you need to know the scale factor, unit cell mass, and unit cell volume for a phases, the MAC of the entire unknown specimen, and the diffractometer constant, K.

There already exists _cell.atomic_mass, _cell.volume, _pd_phase.mass_percent and _pd_char.mass_atten_coef_mu_calc, or _pd_char.mass_atten_coef_mu_obs. There isn't a way to record the scale factor, and previous discussion has shed some doubt on it's general applicability; ie a scale factor computed by GSAS may be different to that by TOPAS dues to differences in how constants are allowed for. Relative values would be the same, absolute, not necessarily.

You then just need K and a way to specify that a particular diffraction pattern is using a particular K from somewhere else (or just to list it in the data block), or that a particular diffraction pattern is the source of the value K.

PD_CALIB_EXT_STD would consist of:

_pd_calib_ext_std.k_factor: (and *_su) the numerical value of K
_pd_calib_ext_std.phase_id: the phase_id of the phase used as the external std. It is assumed that the std is single-phase, which isn't a silly thing to do.
_pd_calib_ext_std.diffractogram_id: the diffractogram_id of the diffractogram collected of the the external std
_pd_calib_ext_std.block_id: the block_id of the data block containing the diffractogram and phase collected of the the external std. It is assumed that both phase and diffractogram information is present in the single data block, which isn't a bad assumption, as stds should be (?) single-phase.
_pd_calib_ext_std.special_details: 'cause sometimes you need the space to give some info

In the data block of the external standard, to record all of the pertinant information:

data_extstd
    _pd_block.id    the_std
    _pd_phase.id    1
    _pd_diffractogram.id    dp1

    _pd_phase.name  SRM676a

    ###
    # unit cell prms, atom positions, and other stuff go here
    ### 

    _cell.atomic_mass               611.768
    _cell.volume                    259.861
    _pd_char.mass_atten_coef_mu_calc        31.5908044  
    _pd_phase.mass_percent              99.02  # this value is defined in the SRM documentation. If it isn't present, assume 100

    # we don't have scale factor in  pdCIF. 
    # It would depend on both phase and diffractogram _and_ analysis program, 
    # and potentially versions of the program, or even which macros/modifications
    # you were using
    # _pd_phase.scale_factor            0.0051354   

    # k_factor = scale_factor * atomic_mass * volume * mass_atten_coef_mu_calc / mass_percent
    _pd_calib_ext_std.k_factor          271.548

    loop_
        _pd_meas.2theta_scan
        _pd_proc.intensity_total
        _pd_proc.ls_weight
        _pd_calc.intensity_total
        _pd_proc.intensity_bkg_calc
        5.001000    43.364000     0.040297    25.994961    25.994961 
        # etc

and unknown would be

data_unknown_1      
    _pd_block.id    the_first_unknown
    _pd_phase.id    2
    _pd_diffractogram.id    dp2

    _pd_phase.name  "The strange white powder"

    ###
    # unit cell prms and other stuff go here
    ### 

    _cell.atomic_mass           3612.958
    _cell.volume                1192.592
    _pd_char.mass_atten_coef_mu_calc    99.8489722  
    _pd_phase.mass_percent          87.98

    #In this instance, you only need one of the following three lines:
    _pd_calib_ext_std.phase_id      1
    #_pd_calib_ext_std.diffractogram_id dp1
    #_pd_calib_ext_std.block_id     the_std

    # we don't have scale factor in  pdCIF. 
    # It would depend on both phase and diffractogram _and_ analysis program, 
    # and potentially versions of the program, or even which macros/modifications
    # you were using
    # _pd_phase.scale_factor        0.00002025034   

    loop_
        _pd_meas.2theta_scan
        _pd_proc.intensity_total
        _pd_proc.ls_weight
        _pd_calc.intensity_total
        _pd_proc.intensity_bkg_calc
        5.001000    43.364000     0.040297    25.994961    25.994961 
        # etc

it would have to have detector_id and phase_id data names

detector_id or diffractogram_id?

How about this?

I think once this is fleshed out, it will deprecate PD_CALIB, PD_CALIB_STD, and potentially PD_CALIB_OFFSET

QPA

Is it possible to force contraindicators? ie you can only have one of the following three categories? I don't think it makes sense to use more than one QPA method on a single diffractogram.

Anyhoo:

External standard

PD_QPA_EXT_STD is a Set category for QPA by external standard. Its Set as it only makes sense to have a single external standard when calibrating QPA of a diffractogram.

The category consists of:

_pd_qpa_ext_std.k_factor: (and *_su) the numerical value of K
_pd_qpa_ext_std.phase_id: the phase_id of the phase used as the external std. It is assumed that the std is single-phase, which isn't a silly thing to do.
_pd_qpa_ext_std.phase_name: Identity of material used as an external standard.
_pd_qpa_ext_std.diffractogram_id: the diffractogram_id of the diffractogram collected of the the external std
_pd_qpa_ext_std.block_id: the block_id of the data block containing the diffractogram and phase collected of the the external std. It is assumed that both phase and diffractogram information is present in the single data block, which isn't a bad assumption, as stds should be (?) single-phase.
_pd_qpa_ext_std.special_details: 'cause sometimes you need the space to give some info

Internal standard

PD_QPA_INT_STD is a Set category for QPA by internal standard. Its Set as it only makes sense to have a single internal standard when calibrating QPA of a diffractogram.

The category consists of:

_pd_qpa_int_std.mass_percent: (and *_su) the mass percent of internal std added to the specimen.
_pd_qpa_int_std.phase_id: the phase_id of the phase used as the internal std.
_pd_qpa_int_std.phase_name: Identity of material used as a standard.
_pd_qpa_int_std.block_id: the block_id of the data block containing the phase of the the internal std.
_pd_qpa_int_std.special_details: 'cause sometimes you need the space to give some info

edit this next section is a bad idea.

Combined internal/external standard

PD_QPA is a Set category for QPA by internal/external standard. Its Set as it only makes sense to have a single standard when calibrating QPA of a diffractogram.

This would replace PD_QPA_EXT_STD and PD_QPA_INT_STD. It's also a little presumptuous in it's naming, implying you can only do QPA by in/external std. I don't think I like this category; it grates a little on me.

The category consists of:

_pd_qpa.external_k_factor: (and *_su) the numerical value of K

_pd_qpa.internal_mass_percent: (and *_su) the mass percent of internal std added to the specimen.

_pd_qpa.phase_name: Identity of material used as a standard.

_pd_qpa.phase_id: the phase_id of the phase used as the in/external std. It is assumed that the std is single-phase, which isn't a silly thing to do.

_pd_qpa.diffractogram_id: the diffractogram_id of the diffractogram collected of the the external std

_pd_qpa.block_id: the block_id of the data block containing the diffractogram and phase collected of the the external std. It is assumed that both phase and diffractogram information is present in the single data block, which isn't a bad assumption, as stds should be (?) single-phase.

~~_pd_qpa.special_details: 'cause sometimes you need the space to give some info~~

RIR

PD_QPA_RIR is a Set category for QPA by reference intensity ratio (also potentially calculating RIRs?). Its Set as a phase can (should?) only have a single RIR value. We will need another category to allow RIR values to be looped when reported in a diffractogram block. Or do we just make this a Loop category?

The category consists of:

To do QPA:
- _pd_qpa_rir.value: (and *_su) the RIR value used to do QPA
To report a calculated RIR measurement.
- _pd_qpa_rir.value: (and *_su) the calculated RIR value
- _pd_qpa_rir.std_phase_id: the phase_id of the phase used as the internal std.
- _pd_qpa_rir.phase_id: the phase_id of the unknown phase having it's RIR determined.
- _pd_qpa_rir.diffractogram_id: the diffractogram_id of the diffractogram collected of the the mixture
_pd_qpa_rir.special_details: 'cause sometimes you need the space to give some info

Intensity

PD_CALIB_INTENSITY is a Loop category for intensity scaling. Its Loop as you need to be able to loop over all detectors. It's keyed on _pd_calib_intensity.id and _pd_calib_intensity.detector_id.

The category consists of:

_pd_calib_intensity.scalar: (and *_su) the value by which _pd_meas.intensity_total values are multiplied by this to get _pd_proc.intensity_total. The source of this value could be from a calibration sample, or some beam intensity monitor, or something else - see special_details.
_pd_calib_intensity.detector_id: To identify a detector in a multi-detector system - taken from _pd_calib.detector_id
_pd_calib_intensity.detector_response: (and *_su) relative sensitivity of a detector in a multi-detector system - taken from _pd_calib.detector_response
_pd_calib_intensity.id: To identify a particular intensity calibration.
_pd_calib_intensity.phase_id: the phase_id of the phase used as the intensity std (if it was a sample).
_pd_calib_intensity.phase_name: Identity of material used as a standard.
_pd_calib_intensity.diffractogram_id: the diffractogram_id of the diffractogram collected of the the intensity std (if it was a sample).
_pd_calib_intensity.block_id: the block_id of the data block containing the diffractogram and phase of the the intensity std (if it was a sample).
_pd_calib_intensity.special_details: 'cause sometimes you need the space to give some info

Wavelength

PD_CALIB_WAVELENGTH is a Loop category for denoting the reference material from which the wavelength was determined (see also _diffrn_radiation_wavelength_determination). Its Loop as you may have multiple phases and diffractograms used in the calibration. It's keyed on _pd_calib_wavelength.id.

The category consists of:

_pd_calib_wavelength.______: (and *_su) Is there are thing to go here? or do we just want to link to the calibrant?
_pd_calib_wavelength.id: To identify a particular wavelength calibration.
_pd_calib_wavelength.phase_id: the phase_id of the phase used as the wavelength std.
_pd_calib_wavelength.phase_name: Identity of material used as a standard.
_pd_calib_wavelength.diffractogram_id: the diffractogram_id of the diffractogram collected of the the wavelength std .
_pd_calib_wavelength.block_id: the block_id of the data block containing the diffractogram and phase of the the wavelength std .
_pd_calib_wavelength.special_details: 'cause sometimes you need the space to give some info

Angle / x-ordinate

But how do we deal with angle? There currently exists PD_CALIB_OFFSET, but this can only record a constant offset of 2Th; you can't record TOF, energy, position etc offsets, or offsets that vary with x-ordinate (eg specimen displacement)

PD_CALIB_ANGLE (is a horrible name, as it should also work for TOF, ED, ...) is a Loop category for for angle calibration. Its Loop as you need to be able to loop over all detectors. It's keyed on _pd_calib_angle.id and _pd_calib_angle.detector_id.

The category consists of:

_pd_calib_angle.______: (and *_su) Is there are thing to go here? or do you just give _pd_meas.2theta and _pd_proc.2theta and be done with it?
_pd_calib_angle.detector_id: To identify a detector in a multi-detector system - taken from _pd_calib.detector_id
_pd_calib_angle.id: To identify a particular angle calibration.
_pd_calib_angle.phase_id: the phase_id of the phase used as the angle std.
_pd_calib_angle.phase_name: Identity of material used as a standard.
_pd_calib_angle.diffractogram_id: the diffractogram_id of the diffractogram collected of the the angle std .
_pd_calib_angle.block_id: the block_id of the data block containing the diffractogram and phase of the the angle std .
_pd_calib_angle.special_details: 'cause sometimes you need the space to give some info
more data names relating to offsets and other variations in the various X-coordinates, ...

The non-QPA parts of the comment at https://github.com/COMCIFS/Powder_Dictionary/pull/46#issuecomment-1367387642 should be a new issue along the lines of "Improve calibration metadata in pdCIF files". Just to keep this pull request discussion focused.

it would have to have detector_id and phase_id data names

detector_id or diffractogram_id?

Well for a given detector and standard I'm assuming there could only be one diffractogram, so any 2 out of the three are sufficient. Detector and standard material strike me as more fundamental, that's all.

Regarding the pd_qpa category described above, I also don't like it as the external and internal standards are fundamentally different in that the external standard characterises a detector, whereas the internal method is notionally at least detector independent. So I'd keep the external and internal categories separate.

I don't understand the comment above about the RIR phases needing a loop. As I understand it a particular compound is chosen as the RIR reference, the RIR value is looked up, and then the rest of the phases can be quantified using their own tabulated RIR values?

Regarding the pd_qpa category described above, I also don't like it as the external and internal standards are fundamentally different in that the external standard characterises a detector, whereas the internal method is notionally at least detector independent. So I'd keep the external and internal categories separate.

That's why it felt icky. Also need to add a detector_id for external std.

I don't understand the comment above about the RIR phases needing a loop. As I understand it a particular compound is chosen as the RIR reference, the RIR value is looked up, and then the rest of the phases can be quantified using their own tabulated RIR values?

I don't know what I meant as well. RIR is a phase property, and you only need an RIR per phase in a mixture and you can quant it - it's "standardless".

it would have to have detector_id and phase_id data names

detector_id or diffractogram_id?

Well for a given detector and standard I'm assuming there could only be one diffractogram, so any 2 out of the three are sufficient. Detector and standard material strike me as more fundamental, that's all.

I don't know. Can't you then just change out all diffractogram_ids for detector_ids? You then need to make all categories loopable to loop over multiple detectors, even if the end product is a single diffractogram...

I've put in detector_id into the external std for the moment.

I've created PD_QPA_EXT_STD and PD_QPA_INT_STD to hold all the data items associated with ext/int stds, and put back all the changes I made to PD_CALIB and PD_CALIB_STD.

External standard

PD_QPA_EXT_STD is a Set category for QPA by external standard. Its Loop as it may need to loop over more than one detector id. Otherwise, it should be treated as a Set category, as it only makes sense to have a single external standard when calibrating QPA of a diffractogram. The loop keys are _pd_qpa_ext_std.detector_id and _pd_qpa_ext_std.block_id.

The category consists of:

_pd_qpa_ext_std.block_id: the block_id of the data block containing the diffractogram (and probably phase) of the external std. If the phase information isn't present, it should be linked from the diffractogram block; there should only be one phase, as stds should be (?) single-phase.
_pd_qpa_ext_std.detector_id: the detector_id of the detector to be calibrated by the external std. if can be used to find the diffractogram.
~~_pd_qpa_ext_std.diffractogram_id: the diffractogram_id of the diffractogram collected of the the external std~~ not yet - will come.
_pd_qpa_ext_std.k_factor: (and *_su) the numerical value of K
_pd_qpa_ext_std.phase_id: the phase_id of the phase used as the external std. It is assumed that the std is single-phase, which isn't a silly thing to do.
_pd_qpa_ext_std.phase_name: Identity of material used as an external standard.
_pd_qpa_ext_std.special_details: 'cause sometimes you need the space to give some info

Internal standard

PD_QPA_INT_STD is a Set category for QPA by internal standard. Its Set as it only makes sense to have a single internal standard when calibrating QPA of a diffractogram.

The category consists of:

_pd_qpa_int_std.block_id: the block_id of the data block containing the phase of the the internal std.
_pd_qpa_int_std.mass_percent: (and *_su) the mass percent of internal std added to the specimen.
_pd_qpa_int_std.phase_id: the phase_id of the phase used as the internal std.
_pd_qpa_int_std.phase_name: Identity of material used as a standard.
_pd_qpa_int_std.special_details: 'cause sometimes you need the space to give some info

RIR

PD_QPA_RIR is a Set category for QPA by reference intensity ratio (also potentially calculating RIRs?). Its Set as a phase can (should?) only have a single RIR value.

The category consists of:

_pd_qpa_rir.special_details: 'cause sometimes you need the space to give some info
_pd_qpa_rir.std_block_id: the block_id of the phase used as the internal std in the determination of the RIR value.
~~_pd_qpa_rir.std_diffractogram_id: the diffractogram_id of the diffractogram collected of the the mixture used to determine the RIR.~~ Not yet!
_pd_qpa_rir.std_phase_id: the phase_id of the phase used as the internal std in the determination of the RIR value.
_pd_qpa_rir.value: (and *_su) the calculated RIR value

Ideally this would be 3 different pull requests for external, internal and RIR as there is a lot to deal with.

I've looked through the definitions and can't see any obvious issues so have merged.

I believe the next step is to write up the way in which QPA results should be expressed in a pdCIF file in a way that is suitable for review by a QPA expert, and for inclusion in Vol G chapter on powder diffraction. The process of writing this up may expose any missing information as well and hopefully generate some examples.

COMCIFS / Powder_Dictionary