update the QPA categories

I started writing up how to report QPA, and realised I had overthought things.

Key changes:

Removed PD_QPA_RIR.
- I think this category was too narrowly focussed, and only served to specify that you did QPA by RIR. This was the only sort of QPA that was given it's own category
Created PD_QPA_CALIB_FACTOR (Set, keyed on _pd_qpa_calib_factor.phase_id)
- This category takes over from PD_QPA_RIR inasmuch as it allows a phase to be given a _pd_qpa_calib_factor.value. For RIR, this data value is the RIR.
- In general, this category allows a phase to be given a calibration constant that allows the calculated intensities or scale factor of that phase (for any diffractogram) to be converted into a mass percentage via the appropriate algorithm, whatever that may be.
Created PD_QPA_OVERALL (Set, keyed on _pd_qpa_overall.diffractogram_id)
- The main data item here is _pd_qpa_overall.method, which is an enumerated list of how the diffractogram was quantified. It can be used just as an information item, but can also be used to provide context to any value associated with _pd_qpa_calib_factor.value. The enumerations were taken from Chapter 3.9 of ITC H.
Renamed PD_QPA_EXT_STD and PD_QPA_INT_STD to PD_QPA_EXTERNAL_STD and PD_QPA_INTERNAL_STD to remove one level of abbreviation.
Rewrote category description of PD_QPA_INTERNAL_STD to say that internal std can be used with a variety of quantification methods in order to give absolute phase quantification (not just Rietveld).

Possible other considerations for the future:

Create PD_QPA_INTENSITY_FACTOR (Loop, keyed on _pd_qpa_intensity_factor.phase_id and _pd_qpa_intensity_factor.diffractogram_id)
- Contains _pd_qpa_intensity_factor.value. This is the value which is divided by _pd_qpa_calib_factor.value in order to give the value which is used in quantification.
  - eg: this would be the Rietveld scale factor for the ZMV method, or a peak intensity for the RIR method.
- This would allow for a verifiable phase quantification, which currently isn't possible for any technique.

Something that we endeavour to do in CIF is to make sure that the way in which a value is used is not changed by the values of other data names. An extreme example would be a data name called _pd.value and another data name _pd.flag with alternative values like absorption, monitor or version, with the interpretation of _pd.value depending on the value of _pd.flag - it could be the absorption coefficient, the monitor counts to scale to, or the version of the software used.

What is OK is for the derivation of the value to be different: so _refln.F_meas for powder is derived in a completely different way to _refln.F_meas for single crystal, and that is OK, as the values are used in the same way e.g. to create a difference density map.

So I'm concerned that the PD_QPA_CALIB_FACTOR category has one of these magic values, but if the data are treated in identical fashion regardless of the origin of the value then it is not a problem. Can you explain @rowlesmr ?

Probably not directly answering your question, but I'm addressing what I'm trying to do.

As you say here, I'd like to include information that allows the quantification to be rederived from values in the collection.

Looking at ZMV formulism, this requires the scale factor. This isn't currently recorded anywhere, and depends on phase, diffractogram, and the exact analysis program and version used. In the above nomencalture, the inverse of a phase's ZMV is _pd_qpa_calib_factor.value, the scale factor is _pd_qpa_intensity_factor.value, and the algorithm is given by _pd_qpa_overall.method.

For RIR, this requires a peak intensity, sum of peak intensities, or a scale-type factor. Again, this isn't currently recorded anywhere, and depends on phase, diffractogram, and the exact analysis program and version used. In the above nomencalture, a phase's RIR is _pd_qpa_calib_factor.value, the peak intensity is _pd_qpa_intensity_factor.value, and the algorithm is given by _pd_qpa_overall.method.

For PONCKS, this is a little different. The peak intensities assigned to the phase, when combined with a 'synthetic' ZMV-like value, and a scale factor can give you quant. In this case, the peak intensities have a meaningful value only when combined with the synthetic ZMV - but they aren't F_squared_meas values, or counts, or intensities. They can also exist as reflections or arbitrary peaks. In the above nomencalture, the inverse of a phase's 'synthetic' ZMV is _pd_qpa_calib_factor.value, the scale factor is _pd_qpa_intensity_factor.value, and the algorithm is given by _pd_qpa_overall.method.

The same goes with absorption-diffraction; the calibration value depends on the exact intensities used, and so we need to ensure that the properly calibrated intensities are either recorded, or some _pd_qpa_intensity_factor.value-like data item can capture the actual value used in the quantification.

It is trivial to make _pd_qpa_calib_factor.____ data items for the various methods, which then breaks the link on how a data item is interpreted. I think that _pd_qpa_intensity_factor.value can be defined in such a way that is always treated the samy way with respect to any _pd_qpa_calib_factor.____ data item. This just leaves _pd_qpa_overall.method to define how to do the calculation.

So I'm concerned that the PD_QPA_CALIB_FACTOR category has one of these magic values, but if the data are treated in identical fashion regardless of the origin of the value then it is not a problem. Can you explain @rowlesmr ?

Here's a summary of how it currently works in this PR:

Categories and data items

PD_QPA_CALIB_FACTOR. (Set , keyed on _pd_qpa_calib_factor.phase_id). The relevant data item is _pd_qpa_calib_factor.value. (The .value acts like the RIR value.)

PD_QPA_INTENSITY_FACTOR (Loop, keyed on _pd_qpa_intensity_factor.phase_id and _pd_qpa_intensity_factor.diffractogram_id). The relevant data item is _pd_qpa_intensity_factor.value. The .value acts like the peak intensity, or the Rietveld scale factor; the 'thing' which is acted on by the calibration factor.

PD_QPA_OVERALL (Set, keyed on _pd_qpa_overall.diffractogram_id). The relevant data item is _pd_qpa_overall.method.

How it's supposed to work

A given diffractogram is marked as being quantified by having _pd_phase_mass.percent values. This is enough to say it has been quantified, but gives no indication as to how this was done. A value can be assigned to _pd_qpa_overall.method, saying how it was quantified. The enumeration was taken from §3.9, Vol H.

The various methods are:

RIR or I/Ic
- W_p = [(I_p/I_p,rel) / C_p] / Sum[(I_k/I_k,rel) / C_k, k=1:P]
- or, if you just worry about the intensities after normalisation by the most intense peak of a phase:
- W_p = [I_p / C_p] / Sum[I_k / C_k, k=1:P]
DDM
- W_p = [I_p / C_p] / Sum[I_k / C_k, k=1:P]
ZMV
- W_p = [S_p / C_p] / Sum[S_k / C_k, k=1:P]
PONKCS
- W_p = [S_p / C_p] / Sum[S_k / C_k, k=1:P]
Absorption-diffraction
- W_p = [I_p / C_p] μ_m
External standard
- W_p = [I_p / C_p] μ_m

where W is the weight fraction, p represents the pth phase, I or S is the intensity or scale factor used to quantify that phase, P is the total number of phases, and μ^*_m is the mass absorption coefficient of the entire specimen. C_p is the calibration factor which puts the intensities/scale factors of the constituent phases onto a common scale to allow for quantification.

I_p and S_p would be recorded using _pd_qpa_intensity_factor.value.

The definition of C_p changes, depending on the _pd_qpa_overall.method.

The various defintions are:

RIR or I/Ic
- C_p =(W_s/W_p) * [(I_p/I_p,rel) / (I_s/I_s,rel)]
- from when the phase was originally calibrated.
DDM
- C_p = (1/M_p) * Sum(n_i,p², i=1:N_p)
- where M_p is the chemical formula weight of phase p, n_i,p is the number of electrons belonging to the i^th atom of phase p, and N_p is the number of atoms in the formula unit of phase p.
ZMV
- C_p = 1/(Z M V)_p
- where Z is the number of formula units per unit cell, M is the chemical formula weight, and V is the volume of the unit cell, all of phase p.
- This one, alone, is dREL-able. You can calculate ZMV from crystal structure information.
PONKCS
- C_p = (W_s/W_p) (S_p/S_s) (1/(ZMV)_s)
- where W is the weight fraction, S is the Rietveld scale factor, Z is the number of formula units per unit cell, M is the chemical formula weight, and V is the volume of the unit cell. The subscript p denotes the analyte phase, and s denotes the standard phase used in the calibration of the PONKCS phase. This value is a "synthetic" ZMV, and is linked to the intensities used to fit the calibration pattern - the intensities used in the PONKCS phase peaks after calibration can be considered to be psuedo-F_squared values (much like a Pawley fit).
Absorption-diffraction
- C_p is made through a calibration sample. Make up a mixture of known composition, and out pops C_p.
External standard
- C_p = K /(Z M V)_p
- where Z is the number of formula units per unit cell, M is the chemical formula weight, and V is the volume of the unit cell, all of phase p, and K is the previously determined diffractometer constant.
- The need for a diffractometer constant is why PD_EXTERNAL_STD exists; you also need to link to the source of K.

Conclusions

In all cases, _pd_qpa_intensity_factor.value is divided by _pd_qpa_calib_factor.value. This result is then treated in varying ways according to the QPA method in order to arrive at the final quant answer.

If you want _pd_qpa_calib_factor.value to have a unique definition (ie as given in the second list, which I think is what you're after), then we'll need one data item per method. If you want _pd_qpa_calib_factor.value to have a unique way of being used, then I've shown that that is the case.

Epilogue

But when doing quant on a diffractogram, the _pd_qpa_calib_factor.value for each phase must be of the same type (except that you can mix ZMV and PONKCS)

So, after writing all of that out, I think that there should be many data items in PD_QPA_CALIB_FACTOR, each one corresponding to each QPA method. _pd_qpa_overall.method still informs the user as to how the QPA was done, and how to combine the values, but not what the individual values mean (hopefully, you can't confuse, for example, RIR values with PONKCS values). I still think that a single _pd_qpa_intensity_factor.value data item is OK, as this value will be quite dependent on the analysis program used and what normalisation constants and such are used. As long as they are consistent, you're OK, but transfering values between TOPAS, GSASII, FullProf, Rietan... will end in tears, if trying to directly recalculate the fit.

I still think that a single _pd_qpa_intensity_factor.value data item is OK, as this value will be quite dependent on the analysis program used and what normalisation constants and such are used. As long as they are consistent, you're OK, but transfering values between TOPAS, GSASII, FullProf, Rietan... will end in tears, if trying to directly recalculate the fit

I like your analysis above, which shows very clearly that the values are used in the same way, even if they are derived in different way, and so I agree that a single data name _pd_qpa_intensity_factor.value is appropriate.

Happy for this to be merged after that single query on "CIF container" is sorted.

COMCIFS / Powder_Dictionary