EarthSystemDiagnostics / piccr

A bundle of R functions to correct and calibrate raw Picarro cavity ring down spectroscopy stable isotope data.
MIT License
1 stars 2 forks source link

Overview of needed quality control information #17

Closed thomas-muench closed 4 years ago

thomas-muench commented 5 years ago

This is a summary issue to list the quality control information piccr is supposed to deliver; both in the stand-alone version as well as when used along with Cpt-Picarr.

These (and more detailed) information are also available from the dataProcessingGuide.Rmd.

Quality control of the measured and processed data is requested to be possible with piccr on three levels: on (1) a per-injection basis, (2) a per-sample basis, and (3) a per-file basis.

Per-injection basis

On this level, the following quality control information is obtained directly from the raw Picarro measurement files:

Per-sample basis

The final isotopic value of a sample or standard is obtained from averaging across a certain (settable) number of injections with the following quality control information:

Per-file basis

On the per-file basis, piccr should provide the following quality control information:

Definitions

The root mean square deviation is defined as

grafik

where the delta term is the deviation for standard i from its true value and k ist the number of standards.

The pooled standard deviation is the square root of the pooled variance which is defined as

grafik

where \sigma_i is the standard deviation of the mean isotope value for vial i from averaging across n_i injections, and N is the total number of analysed vials.

Measurement uncertainty for project

The overall measurement uncertainty for a given measurement project, which contains M single measurements (i.e. Picarro files), can then be assessed with the root mean square deviation of the quality control standards,

grafik

where the delta term is the deviation for measurement file i of the quality control standard from its true value (Note: if each measurement file has several control standards, one can use each file's root mean square deviation from these control standards here).

thomas-muench commented 5 years ago

This issue is related to:

15

EarthSystemDiagnostics/cpt-picarr#39 EarthSystemDiagnostics/cpt-picarr#13 EarthSystemDiagnostics/cpt-picarr#12 EarthSystemDiagnostics/cpt-picarr#11

twollnik commented 5 years ago

@thomas-muench Thank you for putting together this detailed summary.

It would be great if you could be involved in the implementation. Maybe you can take care of the picrr side of things (making sure that piccr outputs all required information). Then I could work on integrating the output into cpt. picarr. Does that work for you?

thomas-muench commented 5 years ago

We decided to first make a short meeting to set up a general piccr output structure for the quality control information. Then I will work on the piccr implementation of this output, and @twollnik will prepare cpt-picarr to handle it.

thomas-muench commented 5 years ago

We decided on the following general output structure:

Output for individual data set

Data output

The measurement data output will include the following components:

Quality control output

The quality control output, as outlined above, splits into information delivered along with the raw or processed data, and into separate information:

Output for M data sets

The output for M processed data sets is a list of length M, where each list element i is a list containing all of the above output.

@twollnik Please have a look over this proposed structure and tell me if you are fine with it or if any changes (e.g. variable names) are necessary from your point of view.

twollnik commented 5 years ago

@thomas-muench Thank you for taking the time to write this. I think that you have captured everything that we talked about (and more). I have one question and some remarks.

The question: Under quality control output >> seperate information you mention named vectors a few times. What names would you choose? (e.g. names(qualityControl) equals what?)

The remarks:

  1. I suggest adding a component name that contains the file name of the raw dataset. That way we can output the actual names of the input datasets and not just file numbers to make the output more understandable.
  2. I suggest to stick to our camelCase naming convention and not use underscores or points in the component names. (This applies to deviationsFromTrue, rmsd.DeviationsFromTrue, calibrationParams, and driftParams.
twollnik commented 5 years ago

Things we should not forget

thomas-muench commented 5 years ago

@twollnik Thanks for your feedback.

I have updated the comment to adopt a consistent camelCase naming convention and included the name parameter.

Regarding your question: What I meant was in each case d18O and dD as names for the vectors, since the respective quantities are all one numeric value for each isotope species. Or should we rather instead use named lists to be more consistent?

twollnik commented 5 years ago

Thanks for integrating my feedback.

Or should we rather instead use named lists to be more consistent?

Yes, good idea.

thomas-muench commented 5 years ago

Ok, thanks, I will edit the structure to also use lists for the respective quantities.

twollnik commented 5 years ago

@thomas-muench I suggest renaming the component qualityControl to controlStandardDeviation to be more precise.

twollnik commented 5 years ago

@thomas-muench I added the file piccrMockOutput to the cpt picarr repository. You can download the file and then execute load("path/to/piccrMockOutput") to load the variable piccrMockOutput into your workspace. It contains example output in the format that cpt picarr expects. (Note that some values are NULL)

thomas-muench commented 5 years ago

@twollnik Thanks for the mock variable; this is helpful.

@thomas-muench I suggest renaming the component qualityControl to controlStandardDeviation to be more precise.

I would not use this name since it sounds like the "control standard deviation (SD)" value. Instead, how about deviationControlStandard or deviationOfControlStandard?

twollnik commented 5 years ago

I like deviationOfControlStandard best.

thomas-muench commented 5 years ago

I like deviationOfControlStandard best.

Updated accordingly.

twollnik commented 5 years ago

@thomas-muench

the mean water vapour level and its standard deviation for this sample/standard;

At the moment this information is not included in the processed data. Would you prefer to include two columns H2O_Mean and H2O_SD in the processed data or should this information be calculated by cpt picarr?

thomas-muench commented 5 years ago

Good point; I would have cpt picarr to calculate this.

thomas-muench commented 5 years ago

@twollnik Should I initialize the output structure such that it always contains all possible elements but certain elements might be NULL when a specific processing step was not switched on (e.g. memoryCoefficients, ...)?

I also would suggest to rename pooledStdDev to pooledSD, since SD is the common abbreviation for the standard deviation.

twollnik commented 5 years ago

@twollnik Should I initialize the output structure such that it always contains all possible elements but certain elements might be NULL when a specific processing step was not switched on (e.g. memoryCoefficients, ...)?

Yes, please.

I also would suggest to rename pooledStdDev to pooledSD, since SD is the common abbreviation for the standard deviation.

I agree.

thomas-muench commented 5 years ago

Left to do here:

thomas-muench commented 4 years ago

Finally completed and closed by #53.