CIRDLES / Squid

Squid3 is being developed by the Cyber Infrastructure Research and Development Lab for the Earth Sciences (CIRDLES.org) at the College of Charleston, Charleston, SC and Geoscience Australia as a re-implementation in Java of Ken Ludwig's Squid 2.5. - please contribute your expertise!
http://cirdles.org/projects/squid/
Apache License 2.0
12 stars 24 forks source link

Reformulation of SQUID 2.50's error propagation related to overcount correction #348

Closed sbodorkos closed 5 years ago

sbodorkos commented 5 years ago

Synopsis The expression coded by Ludwig in the SQUID 2.50 to deal with propagation of the uncertainty associated with the 204Pb overcount-correction looks foreign, and I am pretty sure it is incorrect mathematically. This note:

This document has NO direct impact on the Squid3 code. The relevant code for implementation will be documented (and 'debugged') elsewhere. I am using this Issue purely to capture notes and calculations that are 'new' to me, and to make those materials available to anyone wishing to examine my work. I decided to use an Issue rather than a wiki page because it's so much easier to embed files and images directly in an Issue; the Wiki pages seem to require URLs, which is a bit tedious.)

Introduction Count-rates at 204Pb are very low in many ion probe U-Th-Pb applications, which means measurements made at that peak are vulnerable to systematic sources of inaccuracy. In SHRIMP vernacular, 204Pb can be:

  1. "Overcounted" (e.g. when the true 204Pb signal is spuriously augmented by an isobaric interference). This leads to the proportion of non-radiogenic Pb being overestimated (and the proportion of radiogenic Pb underestimated), resulting in 'corrected' ratios and dates that are "too young", or
  2. "Undercounted" (e.g. when the true 204Pb signal is spuriously depressed by being measured in an incorrect ("off-shoulder") position in the mass spectrum). This leads to the proportion of non-radiogenic Pb being underestimated (and the proportion of radiogenic Pb overestimated), resulting in 'corrected' ratios and dates that are "too old".

For simplicity, the following is confined to the 206Pb/238U vs 207Pb/235U systems, and the concept of "204Pb overcounts from 207Pb". (Analogous arguments are easily extended to the 206Pb/238U vs 208Pb/232Th systems, and the concept of "204Pb overcounts from 208Pb".)

One method for monitoring the accuracy of 204Pb measurements is via the use of an isotopically homogeneous reference material (Black, 2005: https://doi.org/10.1111/j.1751-908X.2005.tb00890.x, PDF Black-2005-GGR_Overcounts.pdf) which has:

  1. Good IDTIMS control on its reference 206Pb/238U AND 207Pb/206Pb values (ideally these would be concordant)
  2. Low radiogenic Pb content (owing to low U content, young age, or both)
  3. Negligible initial Pb (i.e. non-radiogenic Pb incorporated into the lattice at the time of initial crystallisation, as opposed to non-radiogenic Pb of 'environmental' origin, from the mount surface)

The reference material has a constant, IDTIMS-determined [207/206]RM value, and we assume a constant composition for the common Pb components [206/204]c and [207/204]c. Then each spot analysis of the RM yields measured values of [207/206]m and [204/206]m.

In simplest terms, it is possible to calculate a "204Pb/206Pb from 207Pb" value ([204/206]7), which is essentially the [204/206] value needed for that spot in order to generate [207/206]RM from [207/206]m, for specified values of [206/204]c and [207/204]c:

[204/206]7 = ( [207/206]m - [207/206]RM ) / 
  ( [207/204]c - ( [207/206]RM / [206/204]c ) ) -----(A)

This can in turn be used to calculate, for each spot, a "204 overcounts/second (from 207Pb)" value, which is essentially the excess count-rate measured at mass 204 (negative values represent count-rate deficits), relative to the 'perfect' count-rate dictated by [204/206]7. The equation is:

204ovCps (from207) = [Tot204Cps] - [BgdCps] - 
  ( [204/206]7 * ( [Tot206Cps] - [BgdCps] ) ) -----(B)

At the scale of individual spots, these calculations don't look very significant. But during a normal analytical session, an analyst would typically collect dozens of analyses of their 'main' isotopic RM, and even if individual 204ovCps determinations on a low-Pb RM have poor precision, there should still be statistical merit in the central 204ovCps value measured in a population of isotopically homogeneous RM analyses.

So, if there is no systematic inaccuracy in measured 204, and the RM is homogeneous, then the spot-specific 204ovCps values should be distributed approximately symmetrically about a central value of zero, with the extent of dispersion governed primarily on the 'prevailing' counting statistics during the session. In SQUID 2.50 and Squid3, the central 204ovCps value of the RM population is defined as the Tukey's biweight mean (tuning 9), and herein labelled XS204Cps (recalling that a negative value would be a 'negative excess i.e. a deficit). The uncertainty associated with XS204Cps is defined as the 95% confidence interval of the biweight mean, and is herein labelled DelXS204Cps.

Why does it matter? If the range defined by XS204Cps ± DelXS204Cps does not encompass zero, then an analyst might infer a systematic inaccuracy in their measurements at mass 204. A distinctly positive value can be taken to mean that some of the measured counts reflect an isobaric interference (rather than 'true' Pb); a negative value might mean that the measurement at 204 is not large enough to explain the full quantity of non-radiogenic Pb present.

In general terms, applying an appropriate correction to the measured [204/206]u of the unknowns on a spot-by-spot basis involves:

  1. Subtracting the (single, constant) value of XS204Cps (defined by the biweight mean of the RM population of 204ovCps values) from the numerator of each [204/206]u value, and
  2. Augmenting [Del204/206]u by the (single, constant) value of DelXS204Cps.

Context: Calculation of [204/206] and [Del204/206] Mass 204 is commonly characterised by very low count-rates, which means that the usual method of spot-ratio calculation in SQUID 2.50 and Squid3 (which involves inter-scan Dodson interpolation, followed by either linear regression to burn-midtime, or time-invariant weighted mean) is eschewed.

Instead, for ratios involving a numerator or denominator for which the total number of counts (across all scans) is less than 32, both SQUID 2.50 and Squid3 employ a method based on the "total counts" (which really means background-corrected counts) for the numerator and denominator. I have hand-worked (204206_measured_viaVBACode.pdf) the mathematical relationships described in the SQUID 2.50 VBA code, in order to derive and verify the expression implemented for [204/206] and [Del204/206]. The key equation is:

[204/206] = ( [Tot204Counts] / [CT204] ) / ( [Tot206Counts] / [CT206] ) -----(C)

where CT204 and CT206 as the per-scan counting times (in seconds) for masses 204 and 206 respectively. For the value of [204/206], this equation can be reduced to the SQUID 2.50-testable equation governing spot-by-spot [204/206]m and [204/206]u:

[204/206] = ( [Tot204Cps] - [BgdCps] ) / ( [Tot206Cps] - [BgdCps] ) -----(D)

This is intuitively sensible, and relatively easily derived from the 'first-principles' arithmetic documented in the SQUID 2.50 code. Equation (D) corresponds to my hand-worked RatioVal equation 6, near the top of p. 3 in the above PDF).

Determining the associated uncertainty is a bit more complicated, not least because SQUID 2.50 doesn't include any working; it simply specifies the expression used for the fractional error:

[Del204/206] / [204/206] = SQRT( ( 1 / ABS([Tot204Counts]) ) + ( 1 / ABS([Tot206Counts]) ) )
  -----(E)

SQUID 2.50 does not supply total-count data on the StandardData or SampleData sheets, but in pages 3 and 4 on the above PDF, I have reverse-engineered the assumptions inherent in equation (D), and re-expressed it in a form that can be tested directly in a SQUID workbook (hand-worked RatioFractErr equation 12, at the base of p. 4). Defining Nscans as the number of scans in the analysis, and recalling the general expression governing 'TotXXXCounts' for any background-corrected mass-station XXX is:

TotXXXCounts = Nscans * CTXXX * ( [XXXCps] - [BgdCps] ) -----(F)

then the the fractional error for [204/206] in equation (E) can be written as:

[Del204/206] / [204/206] = SQRT( ( 1 / Nscans ) * ( 
  ( 1 / ( CT204 * ABS( [Tot204Cps] - [BgdCps] ) ) ) +
  ( 1 / ( CT206 * ABS( [Tot206Cps] - [BgdCps] ) ) )
  ) ) -----(G)

Applying the overcount-correction to [204/206]u Having derived one value for XS204Cps from the population of spot-by-spot measurements of RM [204ovCps (from207)] via equation (B), the next step is to correct each (spot-by-spot) measurement of [204/206]u for XS204Cps. The corrected ratio for each unknown is herein labelled [OC204/206]u, and SQUID 2.50 calculates the value directly from the source data (i.e. without direct reference to [204/206]u; I think this approach has been chosen to avoid the possibility of circular references):

[OC204/206]u = ( [Tot204Cps] - [BgdCps] - XS204Cps ) / ( [Tot206Cps] - [BgdCps] )
  -----(H)

This equation makes sense, and it satisfies the boundary condition. When XS204Cps = 0, equation (H) becomes:

[OC204/206]u = ( [Tot204Cps] - [BgdCps]  ) / ( [Tot206Cps] - [BgdCps] ) -----(J)
  = [204/206]u from equation (D)

Propagating the overcount-related uncertainty: the problem This is where things get difficult, because the SQUID 2.50 code does not contain any working. Instead, it simply specifies an equation:

[DelOC204/206]u = SQRT( 
  ( [Tot204Cps] / ( Nscans * CT204 ) ) +
  DelXS204Cps^2 ) +
  ( ( 1 - [OC204/206] )^2 * [BgdCps] / ( Nscans * CTBgd ) 
  ) / 
  ( [Tot206Cps] - [BgdCps] ) -----(K)

Equation (K) looks completely foreign! Without knowing where it originates, it's very hard to tell whether it is correct or not. So I needed to determine whether this equation is correct (and its derivation is simply beyond my understanding), or whether it's incorrect. In the latter case, I needed to find and explain the error, and develop an "corrected" alternative that is both conceptually justifiable and arithmetically testable.

In this endeavour, I was greatly hampered by my own lack of understanding of the relevant considerations. I did a lot of algebra (and differential calculus!) that turned out to be mathematically correct in terms of its substituted variables, but which simply wouldn't give sensible answers when applied to the isotopic data.

In retrospect, there is a strong hint in equation (E): uncertainties on [204/206] must be calculated on the basis of total-count data; their Cps 'equivalents' are not an acceptable proxy. In reality, this should have been obvious: of course the number of scans and the count-times at the peaks are important factors in quantifying the uncertainty of an isotopic ratio, so it follows that these factors must be explicitly included in the uncertainty-related mathematics.

Importantly, this means that even though SQUID 2.50 presents Cps-based equations (D) and (G) to calculate the values of [204/206] and [OC204/206] respectively, those equations are not viable starting points for determining the uncertainties associated with those values, and upon closer inspection of [204/206] and [Del204/206], it is clear that equation (E) has been (correctly) derived from equation (C), not equation (D). Sounds simple, but it took me a full week, and 50 pages of botched math, to work it out...

How did Ludwig propagate the overcount-related uncertainty? I have taken a little comfort from the likelihood that Ludwig appears to have (at least momentarily) forgotten this too. His original code (verbatim from the Excel 2003 VBA Editor, with yellow highlight added by me) for calculating [DelOC204/206] is:

image

Term1–Term4 correspond sequentially to the indented lines of equation (K) above, with Term5 converting the absolute [DelOC204/206] to a fractional value. But the two highlighted Comments refer specifically to the variances of Cps values, which are not a substitute for total-count values. Furthermore, the contribution of background to the uncertainty of an isotopic ratio should not be considered explicitly in isolation; it should instead form part of the contribution of the background-corrected 'total-counts' measured at other peaks, as specified by equation (F). Note also that Ludwig's code-snippet makes no reference to "var tot206cts" or any nominal equivalent.

Before I really understood any of this, I did try to hand-calculate [DelOC204/206] starting (incorrectly) from equation (H), and determining partial derivatives from each of its four nominal components: [Tot204Cps], [BgdCps], [XS204Cps], and [Tot206Cps]. The hand-working is here: DelOC204206_CPSbasedAttempt_perLudwig.pdf. I have included it because I did end up with an equation quite similar to equation (K) as specified by Ludwig. Red ink in the final equation in the PDF highlights the differences between equation (K) and my hand-worked equivalent (which appears in the line above).

I am quite sure I have captured the substance of Ludwig's working, because I cannot find any other way to reproduce the distinctive coefficient of [BgdCps] (i.e. ( 1 - [OC204/206] )^2 ) in equation (K). I suspect Ludwig's omission of the 'Tot206Cps' term was deliberate: I guess he considered its coefficient (i.e. [OC204/206]^2 ) to be small enough to neglect. In practice, this is not a universally safe assumption, given that Tot206Cps is usually by far the largest number in this calculation, and also because in real datasets, it is not uncommon to see ABS([OC204/206]) > ABS([204/206]) at the level of individual spots.

Irrespective of that, it looks like the wrong approach, both because it deals in units of [XXXCps] rather than total counts, and because it treats [BgdCps] as an independent variable. But the real kicker with Ludwig's equation (K) is that it does not satisfy the boundary condition. If XS204Cps = DelXS204Cps = 0, then in theory, [OC204/206]u = [204/206]u and [DelOC204/206]u = [Del204/206]u. For the latter, the equation that would need to be satisfied is:

[Del204/206]u = SQRT( 
  ( [Tot204Cps] / ( Nscans * CT204 ) ) +
  ( ( 1 - [204/206]u )^2 * [BgdCps] / ( Nscans * CTBgd ) ) ) /
  ( [Tot206Cps] - [BgdCps] ) -----(L)

Dividing both sides of equation (L) by [204/206]u and factorising ( 1 / Nscans ) in an effort to make equation (L) look more like equation (G) gives:

[Del204/206]u/[204/206]u = SQRT( 1 / Nscans * ( 
  ( [Tot204Cps] / CT204 ) +
  ( ( 1 - [204/206]u )^2 * [BgdCps] / CTBgd )
   ) )
  / ( [Tot204Cps] - [BgdCps] ) -----(M)

It is clear that the right-hand side of equation (M) is not equivalent to that of equation (G), regardless of algebraic manipulation, because equations (K) and (L) employ CTBgd but not CT206. So in summary, I think the expression coded by Ludwig is wrong.

Propagating the overcount-related uncertainty: the solution I am convinced that the solution lies in treating the three (not four) components of [OC204/206] in terms of their total counts, and incorporating background implicitly. I hand-worked this (FINAL_DelOC204206_CORRECT.pdf) in two parts:

  1. Re-derivation of [204/206] and [Del204/206], based on total counts at peak and using partial derivatives, to demonstrate that this approach yields the correct answer, and
  2. Extension of that same, partial derivative-based approach to [OC204/206] and DelOC204/206], on the basis of its three components, two of which are background-corrected.

The second part starts by defining these three components in 'total counts at peak' terms, following equation (F):

Tot204Counts = Nscans * CT204 * ( [Tot204Cps] - [BgdCps] ) -----(N1)
TotXS204Counts = Nscans * CT204 * XS204Cps -----(N2)
Tot206Counts = Nscans * CT206 * ( [Tot206Cps] - [BgdCps] ) -----(N3)

Note that TotXS204Counts does not contain a background-related term, because it is a 'mass-balancing' quantity that corrects an excess/deficit in background-corrected Tot204Counts. Equations (N1)–(N3) can be rearranged to generate terms that can be substituted into equation (H), which governs [OC204/206] i.e.

[Tot204Cps] - [BgdCps] = Tot204Counts / ( Nscans * CT204 ) -----(P)
XS204Cps = TotXS204Counts / ( Nscans * CT204 ) -----(Q)
[Tot206Cps] - [BgdCps] = Tot206Counts / ( Nscans * CT206 ) -----(R)

In terms of the left-hand sides of these last three equations, the general form of equation (H) is:

[OC204/206] = ( P - Q ) / R -----(S)

However, it is critical to use the right-hand sides of equations (P)–(R) in order to obtain the correct form. Setting P = Tot204Counts, Q = TotXS204Counts, and R = Tot206Counts:

[OC204/206] = ( P / ( Nscans * CT204 ) - Q / ( Nscans * CT204 ) ) /
  ( R / ( Nscans * CT206 ) ) -----(T1)

which simplifies to a critically modified version of equation (S):

[OC204/206] = ( CT206 / CT204 ) * ( P - Q ) / R -----(T2)

From there, partial derivatives can be taken with respect to P, Q and R. Recalling that DelP = SQRT( ABS( P ) ), DelQ = Nscans CT204 DelXS204Cps, and DelR = SQRT( ABS( R ) ), the resulting fractional error can be written as:

[DelOC204/206] / [204/206] = SQRT( ABS( P ) + (DelQ)^2 + ( ( P - Q )^2 / ABS( R ) ) ) /
  ABS( P - Q ) -----(U)

The fractional error specified by equation (U) can be written in verbose (SQUID 2.50-testable) form as:

[DelOC204/206] / [204/206] = ( SQRT( 
  ( ABS( [Tot204Cps] - [BgdCps] ) / ( Nscans * CT204 ) ) +
  ( DelXS204Cps )^2 +
  ( 
  ( [Tot204Cps] - [BgdCps] - [XS204Cps] )^2 / 
  ( Nscans * CT206 * ABS( [Tot206Cps] - [BgdCps] ) )
  ) ) /
  ( ABS( [Tot204Cps] - [BgdCps] - [XS204Cps] ) ) -----(V)

One way to check this equation is to apply the boundary condition: when XS204Cps = DelXS204Cps = 0, DelOC204/206] / [OC204/206] = [Del204/206] / [204/206]. So, setting XS204Cps = DelXS204Cps = 0 in equation (V):

[DelOC204/206] / [204/206] = ( SQRT( 
  ( ABS( [Tot204Cps] - [BgdCps] ) / ( Nscans * CT204 ) ) +
  ( ( [Tot204Cps] - [BgdCps] )^2 / ( Nscans * CT206 * ABS( [Tot206Cps] - [BgdCps] ) ) )
  ) ) /
  ( ABS( [Tot204Cps] - [BgdCps] ) ) -----(W)

Factorising ( SQRT( [Tot204Cps] - [BgdCps] ) )^2 from the numerator:

[DelOC204/206] / [204/206] = ( SQRT( [Tot204Cps] - [BgdCps] ) )^2 * 
  ( SQRT( 
  ( 1 / ( Nscans * CT204 * ABS( [Tot204Cps] - [BgdCps] ) ) ) +
  ( 1 / ( Nscans * CT206 * ABS( [Tot206Cps] - [BgdCps] ) ) )
  ) ) /
  ( ABS( [Tot204Cps] - [BgdCps] ) ) -----(X)

Cancelling the first and last lines of equation (X), and factorising ( 1 / Nscans ) from the remaining terms yields:

[DelOC204/206] / [OC204/206] = SQRT( ( 1 / Nscans ) * ( 
  ( 1 / ( CT204 * ABS( [Tot204Cps] - [BgdCps] ) ) ) +
  ( 1 / ( CT206 * ABS( [Tot206Cps] - [BgdCps] ) ) )
  ) ) -----(Y)

and equation (Y) is identical to equation (G) governing [Del204/206] / [204/206], as required by the boundary condition. On this basis, I intend to supplant Ludwig's expression with my own, in both the 'reference' version of SQUID 2.50, and in Squid3.

What effect does it have on the calculations? Below are the results from a random unknown (GA session 180050), for which the co-analysed RM defines an XS204Cps = +0.0055 and DelXS204Cps = 0.0381. There is nothing special about this dataset; it's just one I often use in SQUID 2.50 code documentation, because the Prawn XML file is relatively small, and rapidly processed. The value of [OC204/206] can be orders of magnitude different to [204/206], so their fractional errors are similarly afflicted. I have therefore chose to portray the absolute errors [DelOC204/206], as per Ludwig (red) and my revision (blue). Note that the y-axis is logarithmic.

image

Note that spots 6 and 13 are not part of this comparison, as their [204/206]m is sufficiently high that the values are calculated by inter-scan Dodson interpolation, rather than the low-count-rate arithmetic described herein. For context, the value of [OC204/206] to which these [DelOC204/206] values apply are labelled along the x-axis.

It is clear that my revision yields smaller [DelOC204/206] values across the board, but the magnitude of the difference is not obvious on the logarithmic scale. So I have re-expressed the revised [DelOC204/206] in terms of its percentage reduction, relative to Ludwig's [DelOC204/206]:

image

So in general, most of my revised [DelOC204/206] values are 30–50% lower than those calculated by Ludwig from the same input data, which is probably significant in terms of error propagation to 204Pb-corrected radiogenic ratios, especially 207Pb/206Pb values in the second half of the Proterozoic. This will need further investigation in datasets containing Harvard 91500 as a secondary RM (at some future time!).

These calculations can also be considered in terms of the increase in the size of the uncertainty associated with the overcount-correction, relative to the uncorrected [Del204/206] value. Below is a comparison of the 'inflation factors' for [DelOC204/206] according to Ludwig (red) and my revision (blue), normalised in each case to measured [Del204/206] in the overcount-free scenario (black line at AbsErr = 1).

image

Most of my revised [DelOC204/206] values increase yield values between 150% and 250% of [Del204/206], whereas the corresponding figures for Ludwig are probably 250–500%. The latter look high, but that doesn't necessarily make them wrong.

However, we can assess the merit of the two sets of arithmetic by ZEROing both XS204Cps and DelXS204Cps in SQJUID 2.50. This is achieved by manually overwriting each "204 overcts/sec fr. 207" value in the StandardData sheet with a zero, and pressing F9 to refresh the calculations, whereupon XS204Cps = DelXS204Cps = 0 for the purpose of overcount-correction calculation imposed on the co-analysed unknowns by selecting "Force 207Pb/235U–206Pb/238U concordance" in the Group Me dialog box.

As described above, it is intuitively reasonable to expect [OC204/206] to simplify to [204/206], and [DelOC204/206] to simplify to [Del204/206] in this situation. Below is the result of the corresponding [DelOC204/206] calculation for Ludwig (red), and my revision (blue), noting that in this case, my revision is identical to overcount-free [Del204/206].

image

Even on the logarithmic scale, it is clear that Ludwig's [DelOC204/206] arithmetic is adding a non-negligible component to [Del204/206], rather than matching overcount-free [Del204/206] as demanded by the boundary condition. The difference is even starker when considered in terms of the inflation factors (below): even when XS204Cps = DelXS204Cps = 0, Ludwig's arithmetic routinely results in [DelOC204/206] values that are 200–600% of [Del204/206], which surely can't be right.

image

ryanickert commented 5 years ago

Simon, this is really good work, and IMO, looks like it would be worth publishing so that the rest of the community has a published touchstone on 204-overcounts. At any rate, it's super that you've taken the time to write this out so clearly and put it here. Thanks heaps for this.

Only obliquely related, but do you think that there are any issues associated with either 1) a breakdown of the poisson~gaussian assumption at low count rates (and non-phyisicality of negative count rates) or 2) The "ratio problem" illustrated by the Hawai'i group (e.g., Ogliore et al. 2011).

cwmagee commented 5 years ago

HI Ryan, Thanks for that. Simon may reply in more detail later. In the mean time, to give you some background as to why we're so hung up on overcounts here at GA, in my first 18 months on the job here, our SHRIMP lab had first a Pb contamination problem, and then a 204Pb overcount problem which became apparent once we figured the contamination problem out.

As a result, we have over 100 SHRIMP analytical sessions which show a range of overcount behaviors. We don't currently have an official position on how to handle low count problems. However, we do have substantial amounts of high quality reference material data which could be used to test various hypotheses which people might wish to formulate on this topic. So if you or @noahmclean or anyone else mathematically minded wants access to a large compilation of data which was collected to monitor our overcount situation, we would be happy to provide it along with supporting contextual information.

image

sbodorkos commented 5 years ago

@ryanickert I do have some (non-authoritative) ideas and opinions on some of those things, which I will write about this evening, if I get a chance...

For overcount-fans playing along at home, here is a "frozen" SQUID 2.50 Excel-book that shows the effect of the revised arithmetic. 180050_OvCtSensitivity_frozen.zip

The RM is TEMORA2, and its "204 overcts/sec from 207" is negligible (from the population, XS204Cps = +0.005, DelXS204Cps = 0.038, see yellow highlight at foot of StandardData rows).

The "unknowns" are 91500 and OG1, and the ratio of interest is 204corrected 207Pb/206Pb (most easily assessed by their Ages). I have Grouped each dataset 3 times in SQUID 2.50 (having implemented my revised math in parallel with Ludwig's, so I can switch easily between them, for the sake of interest). Sheets with suffix ..._NONE have no overcount-correction, those suffixed ..._Revised are overcount-corrected from 207Pb using my new expressions, and those suffixed "..._Ludwig" are overcount-corrected from 207Pb using the established SQUID 2.50 math.

The effect is muted, probably owing to the very small XS204Cps value, but it is still quite discernible in the 91500 datasets, especially in terms of the uncertainties of individual analyses (although I have highlighted the WtdAv summaries yellow for ease of comparison at sheet-to-sheet scale). This is understandable in terms of 91500 zircons containing relatively little radiogenic Pb, especially in comparison to OG1, where overcount-related effects are completely negligible, as expected.

@cwmagee I think it would be interesting to try out a couple of those 91500-bearing datasets with large XS204Cps values, but also some datasets where the DelXS204Cps value is quite large, as I don't have much feel for whether the value or the error exerts more influence on the calculations. I intend to get on with documenting the revised code for incorporation in Squid3.

ryanickert commented 5 years ago

Hi both, thanks for the context. It sounds like a really challenging (read: fun) problem, but it's too bad that it's affecting data. :( These low-count-rate issues are really hard, and I'm interested to see what you come up with! bitmoji

cwmagee commented 5 years ago

Ryan, it was a nightmare- especially the three month period where we knew how to create overcounts (e.g. clean an epoxy mount with a series of solutions ending in milliQ and then immediately coat and run it) but not how to make them go away.

Overcount examples.zip

Simon, attached is a zipped file with the 91500 overcount summary table, and three example sessions which illustrate overcount issues of various magnitude and precision. If the summary leads you to want a different file, everything can be found in Geochron\Session data backup. The overcount summaries for OG1 and Temora are in Geochron\Overcounts_FY1617.

cwmagee commented 5 years ago

BTW, Ryan, do you have any opinion or suggestions on the applicability of Coath et al. (2013)'s use of the Kummer confluent hypergeometric function? Or is that our winter school break reading assignment?

sbodorkos commented 5 years ago

Thanks Ryan, it was important to take the time to write it out, because it's all new to me, and I'm pretty sure that when it's time to justify it all, I will have forgotten most of it! I had been suspicious of the Ludwig expression for a long time, but couldn't think how to prove it was wrong, formulate an alternative, or even conceive of a test for 'correctness'. So it is good to have it sorted out.

In terms of non-physicality, I am a big believer in separating the measurements from the calculations. I certainly believe that actual counts (ions and SBM, as written to Prawn XML) should be cardinal numbers befitting their physical quantities. But our isotopic ratios are calculated values, and I don't think those calculations should be compromised by the notion of non-physicality (at least, not as a general principle). My own feeling is that arithmetical consistency is more important, and that there can be value in negative 204/206 ratios, for example.

For one thing, I think there is a tendency for humans to look too closely at the value and not enough at the uncertainty. In any given zircon, your analyses will sample a distribution of count-rates at mass 204 and mass 204.1, and in a clean zircon, those distributions might look like 0.20 ± 0.10 (2s) at mass 204, and 0.18 ± 0.10 (2s) at background. Recalling that conceptually, [204/206] = ( [Tot204Cps] - [BgdCps] ) / ( [Tot206Cps] - [BgdCps] ), it follows that sampling the two distributions above is going to give negative apparent [204/206] fairly often, just through random chance. But frequently the uncertainty will exceed 100%, and positive values will be 'permitted' by their span. I think it's better to present the results of that calculation in a completely consistent way.

From an operator's point of view, obviously you can pick up data-collection errors e.g. if you accidentally measure background at mass 205 rather than 204.1, pervading negative 204/206 is a useful red flag for something that can otherwise slip through the cracks. (Same applies when 204Pb measurement position is misconfigured, and it turns out you're measuring counts off the shoulder of the peak.)

But I think there are also important applications relating to the presence of isobaric interferences at 204 and/or background, where it would be unhelpful to enforce a lower limit of zero for [204/206]; I have attached an example of a dataset where consistent calculation and portrayal of [204/206] over its range of computed values helps to properly illustrate counting issues affecting ( [Tot204Cps] - [BgdCps] ): see Figure 2 of this PDF from the Geological Survey of Western Australia's compilation: 180256.1.pdf.

In terms of ratio estimation, I'm not really qualified to comment, other than absolutely acknowledging the problem as outlined by Ogliore et al 2011 Ratio estimation in SIMS analysis.pdf. In SQUID 2.50 and Squid3, the majority of our [204/206] are "ratio of means", whereas everything else in SQUID 2.50 is generally "mean of ratios", and the implications of that are not known. At the moment, the only "evidence" we have that the overall effect (at the scale of weighted mean dates for samples) is not major is the promising results of inter-technique comparisons ("natural" SHRIMP vs CA-IDTIMS), which @cwmagee is currently documenting for publication.

Mathematical "renovation" of Squid3 should probably look at Noah McLean's log-ratio work (https://doi.org/10.1016/j.gca.2013.08.035 and elsewhere) as part of the solution to this problem. It won't be completely straightforward (especially as I have just finished saying I favour negative isotopic ratios as calculated, irrespective of non-physicality!), but I am sure clever people could find a workable transformation, especially as log-ratios solve the inverted-ratio problem highlighted by Ogliore et al. (2011). Maybe it's possible to express all ratios as abs( ratio ) < 1, add 1 to all of them, do the log-ratio math, and then exponentiate and subtract 1. But maybe you can't do that – I don't know.

ryanickert commented 5 years ago

BTW, Ryan, do you have any opinion or suggestions on the applicability of Coath et al. (2013)'s use of the Kummer confluent hypergeometric function? Or is that our winter school break reading assignment?

Not really - I always used equation 45 in Ogliore et al. to work out whether there is a problem or not. And I don't know how to intercompare the Hawai'i and Bristol approaches, unfortunately.

ryanickert commented 5 years ago

w/r/t the negative numbers, I guess I think that because they arise out of the affirmative choice to use Gaussian distribution parameters to characterise measurements drawn from Poisson distributions, if substantive fractions of the distribution is negative, that might be telling me that the gaussian assumption is breaking down and may be introducing bias. But that's just a gut feeling - it may not actually make a difference, which is what should matter.