CIRDLES / Squid

Squid3 is being developed by the Cyber Infrastructure Research and Development Lab for the Earth Sciences (CIRDLES.org) at the College of Charleston, Charleston, SC and Geoscience Australia as a re-implementation in Java of Ken Ludwig's Squid 2.5. - please contribute your expertise!
http://cirdles.org/projects/squid/
Apache License 2.0
12 stars 27 forks source link

Capturing membership of weighted means (via Squid3 checkbox) in Report CSVs #733

Open sbodorkos opened 1 year ago

sbodorkos commented 1 year ago

We have talked previously about ways to rigorously link Report CSV-rows to calculated weighted means (as appended to the WM stats file). The Report CSV (whether Sample or Reference Material) ought to directly incorporate information about which rows were included in the mean and which were not (as indicated by the checkboxes for individual analyses).

For Reference Materials, the columns of primary interest are the Calibration Constants, and the options are related to the index-isotope for common Pb. So:

For Perm1: (4corr 206/238 calib constant) OR (7corr 206/238 calib constant) OR (8corr 206/238 calib constant) For Perm3: (4corr 208/232 calib constant) OR (7corr 208/232 calib constant) For Perms2+4: (4corr 206/238 calib constant AND 4corr 208/232 calib constant) OR (7corr 206/238 calib constant AND 7corr 208/232 calib constant)

Probably the most rigorous way to handle this is to have an extra auto-column (e.g. PB204C_PB206_U238_CALIB_CONST_MEAN) in the CSV report for each calib constant within each of the possible options, which is dedicated to reporting the status of the checkbox for that row. Within the field, there would be three possible statuses: "included" for a spot calib-constant value that was included in the mean, "excluded" for a spot that wasn't, and "NA" , which would be deployed at column-scale, covering all unused column-options.

Using Perm1 as an example, if the user chose 204Pb as their preferred index-isotope for common Pb, then the auto-column for 4corr 206/238 calib const would comprise solely "included" or "excluded" according to the checkboxes, whereas ALL rows in the 7corr 206/238 calib const and 8corr 206/238 calib const auto-columns would be labelled "NA". But those latter columns need to be there, and theoretically populatable, because it is so easy for users to simply switch the radio-button controlling the applicable common-Pb index isotope.

Remember also that in Perms2+4, calib consts function as "locked pairs", so if the user chooses 204Pb as the index isotope, then both the 4corr 206/238 calib const and the 4corr 208/232 calib const would be populated with either "included" or "excluded" according to their (independent sets of) checkboxes, whereas both 7corr 206/238 calib const and 7corr 208/232 calib const would be populated with NA throughout.

For Samples, the situation would be simpler. We would need 7 columns, covering the 7 different age-types (e.g. PB204C_PB206_U238_MEAN), but all 7 are always available for population, irrespective of the Perm used for the Reference Material. So there would always be 6 full sets of "NA" for all Samples at all times, and one set of "included" vs "excluded".

A final measure to consider would be to append a text-string of the summary stats, as currently written to the RefMat_Summary and or WM stats text-files, as the rightmost column, for each of the rows with the label "included". In the case of a user having calculated and appended multiple iterations of a mean to their stats file(s), this might help them track exactly which weighted-mean calculation is reflected by the CSV report.

bowring commented 1 year ago

Spoke with @sbodorkos and clarified that these new columns will be included in the built-in reports and available on an all-or-nothing basis for custom reports.