CIRDLES / Squid

Squid3 is being developed by the Cyber Infrastructure Research and Development Lab for the Earth Sciences (CIRDLES.org) at the College of Charleston, Charleston, SC and Geoscience Australia as a re-implementation in Java of Ken Ludwig's Squid 2.5. - please contribute your expertise!
http://cirdles.org/projects/squid/
Apache License 2.0
12 stars 24 forks source link

Simple advisory (and setting-switch) for invalid SBM #340

Closed sbodorkos closed 4 years ago

sbodorkos commented 5 years ago

I have extracted this aspect from closed issue #337 because it just keeps on tripping me up, even though I "know" it's a problem. It follows that it's going to be a real obstacle for people less familiar with the quirks of the data-reduction!

Recall that each analysis (i.e. [run]i) comprises [Nscans]i * [Npeaks]i mass-stations, and has ONE SBMzero value [SBM_zero]i. Each mass-station has an SBMCps value calculated very early in the data-reduction, as documented at the top of https://github.com/CIRDLES/ET_Redux/wiki/SHRIMP:-Step-2 and highlighted in the screenshot below.

image

At present we have no meaningful error-trapping for 'invalid' SBM data (defined narrowly here as a mass-station for which SBMCps[j, k] <= SBM_zero_cps). Essentially what we should do is log/count the number of mass-stations for which the highlighted line returns SBMCps[j, k] <= 0, as early as possible in the data reduction. In the event that one or more such mass-stations are found during processing of the Prawn XML file, the user should receive a dialog-box-style notification (with an "OK" button) that reads something like:

"This Prawn XML dataset contains invalid (sub-zero) SBM readings (X occurrences, out of Y mass-station measurements in total). SBM normalisation is not available for reduction of this dataset."

In this dialog, X is a count of the number of mass-stations yielding non-positive SBMCps values across the entire dataset, and Y is a count of the total number of mass-stations collected across the entire dataset (could be approximated as Nruns Nscans Npeaks, assuming all runs have the same Nscans and Npeaks; should be more rigorously calculated as (Nscans[i] * Npeaks[i]) for each run[i], summed over all runs). Note that Allen's dataset is an extreme case; without doing the calculations, I suspect X = Y there. Most other datasets we have tested are at the opposite extreme: X = 0.

What to do when X > 0? To begin with, we should simply implement the Ludwig-style "nuclear option" regarding SBM normalisation: If X > 0, SBM-normalisation should be switched OFF in the data-reduction arithmetic for the entire processing of the Prawn XML file, and the "Normalise to SBM?" radio-button should be set to "No" and greyed-out to prevent user interaction. This control resides on the Manage Current Task screen, even though SBM-normalisation (and Ratio Calculation Method) is Task-independent, as per #291.

Down the track, we could modify the above "mass-station-by-mass-station" approach, to instead audit SBMCps on an analysis-by-analysis basis, advise the user of how many analyses contain one or more mass-stations with non-positive SBMCps values (compared to the total number of analyses in the Prawn XML file), and perhaps give the user two options:

  1. Use SBM-normalisation where possible (i.e. SBM normalisation ON for all analyses with X[i] = 0, otherwise OFF).
  2. No SBM-normalisation at all (as per the "nuclear option" above; this option treats all analyses the same way).
cwmagee commented 5 years ago

I suggest a third option: "Discard spot, use SBM for remainder". From a hardware point of view, the most common way to generate an SBM below background is to have the ion source and/or accelerating voltages fail during an analysis. Ususally this is followed by accumulation of zero (or very low) count data (and SBM near background) through the end on the analysis, followed by the automatic termination of the autorun when the next spot(s) fail their secondary tuning. That generally is followed by user intervention to fix the fault, and continuation of the dataset. So being able to cut spots with invalid SBM (which may also have analytically useless data) automatically would be user friendly.

NicoleRayner commented 5 years ago

Maybe I am missing something here, but this has never been a problem for me with SQUID3 and I certainly have instances where SBM cps go below the SBM Zero setpoint. As @cwmagee points out this is almost always when the duo or HV go out. My SQUID3 data reduction habits (such as they are) always organically included looking at the Audit Raw Data/Primary beam intensity and in cases where it drops to 0, I know the duo flamed out and I remove the analysis from the xml. Now I suppose this wouldn't reveal HV tripping. Perhaps a simple solution would just be to add SBM data plots to the Audit Raw Data window so users can do a quick scan. To me, the audit to remove those oddballs is pretty intuitive so I don't think it is too much to ask of users.

Here is a squid file where analysis 12399-064.1 has SBM/signal drop out and SQUID copes with it fine (the data is not usable but it doesn't bog anything down).

IP943 to test SBM fail.zip

cwmagee commented 5 years ago

Add sbm plots to audit is a good idea. As is a summary as Simon suggested. Has rhe squid 1.5 scan rejection logic been ported? That might be useful for data transfer zeros like the IPCS2 dataset.

sbodorkos commented 5 years ago

@NicoleRayner yes, that is fair enough when 'invalid SBM' is an intermittent/infrequent problem (i.e. instrument fault), and when you are familiar with the analytical metadata, having acquired the analyses yourself.

But there are other ways 'invalid SBM' can arise, the simplest being the acquisition of an SBM_zero value on one (smaller) SBM range, followed by data acquisition using a different (larger) SBM range, while forgetting to re-zero. I suspect this is pretty common, especially among 'occasional' SHRIMP users. And the problem is exacerbated when you are being asked to assess/troubleshoot someone else's data, and that analytical session history is not available to you (most of my own work falls into this category; I spend very little time on my own data, compared to everyone else's!). The need to examine .squid files 'cold' will be a problem for Squid3 mentors everywhere, as the software is taken up by the community, and people want to know why 'their' data-reduction doesn't work.

I attach Allen Kennedy's Squid3 file as an example of the combination of all these factors (I was asked to look at this 'cold', just as I am asking you to do now).

AKK_XENO_Squid3.zip

Point is, Squid3 has thrown no error messages on this data-reduction, Audit Raw Data doesn't turn up anything unusual, and there are no Unhealthy expressions in the Expression Manager. It's only when you begin digging around the individual expressions that you see persistently strange outputs in the Peek window... and for most people, the problem will still be cryptic. (If you want to crash Squid3, try Visualisations... Reference Materials... Weighted Means.)

For me, the penny didn't drop until I looked at the Ratios with Uncertainties, and realised that every Value, for every ratio in every spot, is -9.87654321012, which I was also able to recognise as the 'SQUID_Error_Value' (because I documented the code; I'm not sure how many other SQUID users in the world could have made that diagnosis). This means data-reduction has failed at a very fundamental (pre-ratio) level.

Even with that knowledge, it took me a while to work out the next step (which is Report Tables... Produce Sanity Check Reports). Having found the resulting data-folder and moved it to my Desktop (if anyone else is getting 'File not found' errors when trying to open those Sanity Check CSVs in Windows, it's because the filename and path is too long; move the folder so it is less deeply buried!), I opened the ...Check_01... file and it looked OK. But even when I opened the ...Check_02... file, I still needed to do the mental arithmetic (converting count time/10 to cps) for myself, to realise that the SBM readings at integration-scale were obviously too low relative to the SBM_zero, and that the problem was universal.

Anyway, it was hard, and it took a couple of hours to work my way through it (because of course I tried a lot of things that were completely irrelevant, before I realised that data-reduction had essentially not happened). And Squid3 itself is certainly not to blame, because I was the one who skipped the error-handling when reading the fundamental input data. Given our universal agreement that negative SBM counts are never OK, it makes sense for Squid3 to explicitly flag the issue (the same argument could be made for negative ion-counts potentially arising from counting-system errors or software issues).

But on reflection, I agree with you @NicoleRayner that SQUID 2.50-style automated handling of 'invalid SBM' occurrences is premature at this stage, because Squid3 error-trapping of the input Prawn XML file is something we ought to look at holistically (there are quite a few other potential problems, unrelated to SBM, that should be 'trapped' at the same time). So Jim @bowring, I suggest you skip the detailed diagnostics I originally suggested, and just implement a simple check to see if there are any non-positive SBMcps in a given Prawn XML. If one is found, just a static dialog-box (with an OK button) saying:

"Invalid SBM counts detected. Squid3 recommends switching SBM normalisation OFF until you diagnose the problem."

will suffice, as an interim measure.

@cwmagee proper 'robustification' of Squid3 relative to its input data will need to look at quite a lot of things, like non-negative integers for ion counts, SBM checks as outlined here, but also the handling of 'incomplete' analyses written to file (whether aborted mid-scan or end of scan), because as far as I know, that sort of thing is completely untested. But as a body of work, it's a job for another day.

cwmagee commented 5 years ago

Aren't op file counts non-integer, due to dead time correction?

sbodorkos commented 5 years ago

No. Each 'total counts at peak' value is indeed deadtime-corrected, but after correction, the answer is (symmetrically and arithmetically) rounded to integer. In a way, OP files are best of all, because all their numerical data are cardinals - you can search/replace hyphens and decimal points knowing they will only occur in spot-labels. No such luxury in PD/XML!