CIRDLES / Tripoli

Tripoli imports raw mass spectrometer data files and supports interactive review and archiving of isotopic data. Tripoli facilitates visualization of temporal trends and scatter during measurement, statistically rigorous filtering of data, and calculation of statistical parameters.
http://cirdles.org/Tripoli/
Apache License 2.0
8 stars 12 forks source link

Deal with Case1 data where some ratio data are <= 0 #214

Open noahmclean opened 7 months ago

noahmclean commented 7 months ago

Case1 data (OG Tripoli functionality) detects isotope ratio data. For isotope ratio time series, Tripoli should take the logarithm of the ratios, evaluate statistics, then exponentiate. This is the additive log-ratio transform of Aitchison's compositional data approach. The problem is when an inaccurate baseline or deadtime measurement yields measured isotope ratios that are less than or equal to zero. You can't take a logarithm of a negative number.

noahmclean commented 7 months ago

The optimal solution, I think, is to use the measured intensities of the baselines and isotopes inside a model that is parameterized for relative abundances that are greater than zero. This is what we're doing in the MCMC part of the code.

However, many datasets don't contain the intensities and integration times you need to make these calculations. Especially legacy datasets can consist of an isotope ratio name (e.g., 206/204) and a set of (e.g., 100 ) values of measured isotope ratios.

noahmclean commented 7 months ago

Non-optimal solutions include:

  1. Taking the mean and standard error of all the isotope ratios as-is (no log transform, no rejection). Downsides to this approach include (a) problem isotope ratios often include low-abundance isotopes with low count rates, and it's precisely these data that benefit most from the log-ratio transform, (b) relatedly, the mean will be biased towards a higher relative abundance of the low-abundance isotope, and (c) it's possible to end up with a negative mean result, even though you checked your mass spectrometer for antimatter leaks.

  2. Reject the negative ratios, then evaluate the mean (or log-mean) of the ratios greater than zero. Downsides to this approach include (a) and (b) above, but to a greater extent. However, we avoid (c).

  3. Add a small constant to each measured ratio so that the sum is $\geq 0$ for all ratios. Downsides to this approach include (a) if the problem is a baseline or dead time subtraction, that subtraction will affect a collection of measured intensities equally but not necessarily a collection of measured ratios, each of which will include a different intensity, and (b) the potential to bias the ratio of interest by an amount up to that small constant (for arithmetic mean), or in an unpredictable way (geometric mean) depending in the latter case on the new minimum ratio value.

ryanickert commented 7 months ago

ngl i've not checked for antimatter leaks lately.

ryanickert commented 7 months ago

Q: If this is designed to accommodate inaccurate data, is it worth the effort? Seems like suboptimal approaches might be fine if the data is not accurate in the first place.

Comment: There may be cases where the data are accurate and still negative. Very low precision (but accurate) analyses of baselines/backgrounds and very low precision (but accurate) analyes of ion beam intensites can yield negative values.

noahmclean commented 6 months ago

A: You're right, we don't want to spend much time handling this specific scenario for legacy data. Our new algorithms should improve the accuracy and precision of near-detection-limit measurements. But, we still want to be able to handle older data files and formats, in part because we said we were going to.

So, non-optimal solution 1 above seems to be the thing to do. This is (to my knowledge) the traditional way of handling this sort of data in the absence of direct intensity measurements. The negative (but large uncertainty) ratios can then go off to data reduction software and be handled from there -- e.g., as an isobaric interference correction with a small magnitude but large uncertainty.

noahmclean commented 5 months ago

Jim: The resolution to this question is the following: Before calculating the statistics for an isotope ratio in Case1, check and see if any of its data points are $\leq 0$. If so, then treat the data like a 'User Function' -- take the mean, standard deviation, and standard error of the data without a log transform. Report the statistics and shade the plot just like you a user function. If the user rejects some data in 'sculpt mode', then check again to see if any data points are $\leq 0$ before re-calculating statistics.

bowring commented 4 months ago

As of v0.5.2, the solution is to reject the negative ratios out of hand as per non-optimal item 2 above. Will provide the "resolution" technique in a future release.