Closed thomas-muench closed 4 years ago
This issue is related to:
EarthSystemDiagnostics/cpt-picarr#39 EarthSystemDiagnostics/cpt-picarr#13 EarthSystemDiagnostics/cpt-picarr#12 EarthSystemDiagnostics/cpt-picarr#11
@thomas-muench Thank you for putting together this detailed summary.
It would be great if you could be involved in the implementation. Maybe you can take care of the picrr side of things (making sure that piccr outputs all required information). Then I could work on integrating the output into cpt. picarr. Does that work for you?
We decided to first make a short meeting to set up a general piccr output structure for the quality control information. Then I will work on the piccr implementation of this output, and @twollnik will prepare cpt-picarr to handle it.
We decided on the following general output structure:
The measurement data output will include the following components:
name
: the file name of the data set [character vector]raw
: the original measurement data before any processing was done [data frame];memoryCorrected
: the data after applying the memory correction [data frame];calibrated
: the data after applying only a calibration using first-block standards [data frame];calibratedAndDriftCorrected
: the data after applying a linear drift correction and a calibration using first-block standards, or after applying a double calibration (with inherent drift correction) [data frame];processed
: the final data from averaging across n
injections [data frame].The quality control output, as outlined above, splits into information delivered along with the raw or processed data, and into separate information:
deviationsFromTrue
: data frame with the deviations from the true values for all measured standards = columns Identifier 1
, block
, d18OMeasured
, d18OTrue
, d18ODeviation
and the same last three columns for dDdeviationOfControlStandard
: named list (components d18O
and dD
) with the deviation from the true value for the quality control standard for d18O and dDrmsdDeviationsFromTrue
: named list (components d18O
and dD
) with the rmsd of deviationsFromTrue
for d18O and dDpooledSD
: named list (components d18O
and dD
) with the pooled standard deviation for d18O and dD for the data setmemoryCoefficients
: data frame with memory coefficients = mean and indivdiual values for each analysed standards = columns Inj No
, mean
, <standard-name1
, ..., each for d18O and dDcalibrationParams
: data frame with columns block
= standard block used for calibration, pValue
= p-value of the calibration regression, d18ORMSDOfResiduals
, dDRMSDOfResiduals
= RMSD of calibration regression residuals, d18OSlope
, dDSlope
, d18OIntercept
, dDIntercept
= slope and intercept of calibration for both isotope species, timeMean
= mean measurement time since start for this blockcalibration_method=1
] driftParams
: data frame for drift parameters with columns variable
= name of the standard used for the estimate or the mean estimate, d18OAlpha
, dDAlpha
= estimated drift rates, pValue
= p-value of the linear drift regression, d18ORMSDOfResiduals
, dDRMSDOfResiduals
= RMSD of drift regression residuals (p-value and RMSD values are NA for the mean estimate)The output for M
processed data sets is a list of length M
, where each list element i
is a list containing all of the above output.
@twollnik Please have a look over this proposed structure and tell me if you are fine with it or if any changes (e.g. variable names) are necessary from your point of view.
@thomas-muench Thank you for taking the time to write this. I think that you have captured everything that we talked about (and more). I have one question and some remarks.
The question: Under quality control output >> seperate information
you mention named vectors a few times. What names would you choose? (e.g. names(qualityControl)
equals what?)
The remarks:
name
that contains the file name of the raw dataset. That way we can output the actual names of the input datasets and not just file numbers to make the output more understandable.deviationsFromTrue
, rmsd.DeviationsFromTrue
, calibrationParams
, and driftParams
.writeDataToFile(..)
needs to be updated to be able to work with the new format.outputSummaryFile(..)
needs to be updated to work with the new format and to include more quality control information.@twollnik Thanks for your feedback.
I have updated the comment to adopt a consistent camelCase naming convention and included the name
parameter.
Regarding your question:
What I meant was in each case d18O
and dD
as names for the vectors, since the respective quantities are all one numeric value for each isotope species. Or should we rather instead use named lists to be more consistent?
Thanks for integrating my feedback.
Or should we rather instead use named lists to be more consistent?
Yes, good idea.
Ok, thanks, I will edit the structure to also use lists for the respective quantities.
@thomas-muench I suggest renaming the component qualityControl
to controlStandardDeviation
to be more precise.
@thomas-muench I added the file piccrMockOutput to the cpt picarr repository. You can download the file and then execute load("path/to/piccrMockOutput")
to load the variable piccrMockOutput
into your workspace. It contains example output in the format that cpt picarr expects. (Note that some values are NULL)
@twollnik Thanks for the mock variable; this is helpful.
@thomas-muench I suggest renaming the component
qualityControl
tocontrolStandardDeviation
to be more precise.
I would not use this name since it sounds like the "control standard deviation (SD)" value. Instead, how about deviationControlStandard
or deviationOfControlStandard
?
I like deviationOfControlStandard
best.
I like
deviationOfControlStandard
best.
Updated accordingly.
@thomas-muench
the mean water vapour level and its standard deviation for this sample/standard;
At the moment this information is not included in the processed data. Would you prefer to include two columns H2O_Mean
and H2O_SD
in the processed data or should this information be calculated by cpt picarr?
Good point; I would have cpt picarr to calculate this.
@twollnik Should I initialize the output structure such that it always contains all possible elements but certain elements might be NULL
when a specific processing step was not switched on (e.g. memoryCoefficients, ...)?
I also would suggest to rename pooledStdDev
to pooledSD
, since SD is the common abbreviation for the standard deviation.
@twollnik Should I initialize the output structure such that it always contains all possible elements but certain elements might be
NULL
when a specific processing step was not switched on (e.g. memoryCoefficients, ...)?
Yes, please.
I also would suggest to rename
pooledStdDev
topooledSD
, since SD is the common abbreviation for the standard deviation.
I agree.
Left to do here:
calibrationParams
driftParams
Finally completed and closed by #53.
This is a summary issue to list the quality control information piccr is supposed to deliver; both in the stand-alone version as well as when used along with Cpt-Picarr.
These (and more detailed) information are also available from the
dataProcessingGuide.Rmd
.Quality control of the measured and processed data is requested to be possible with piccr on three levels: on (1) a per-injection basis, (2) a per-sample basis, and (3) a per-file basis.
Per-injection basis
On this level, the following quality control information is obtained directly from the raw Picarro measurement files:
Per-sample basis
The final isotopic value of a sample or standard is obtained from averaging across a certain (settable) number of injections with the following quality control information:
Per-file basis
On the per-file basis, piccr should provide the following quality control information:
Definitions
The root mean square deviation is defined as
where the
delta term
is the deviation for standardi
from its true value andk
ist the number of standards.The pooled standard deviation is the square root of the pooled variance which is defined as
where
\sigma_i
is the standard deviation of the mean isotope value for viali
from averaging acrossn_i
injections, andN
is the total number of analysed vials.Measurement uncertainty for project
The overall measurement uncertainty for a given measurement project, which contains
M
single measurements (i.e. Picarro files), can then be assessed with the root mean square deviation of the quality control standards,where the
delta term
is the deviation for measurement filei
of the quality control standard from its true value (Note: if each measurement file has several control standards, one can use each file's root mean square deviation from these control standards here).