CIRDLES / Squid

Squid3 is being developed by the Cyber Infrastructure Research and Development Lab for the Earth Sciences (CIRDLES.org) at the College of Charleston, Charleston, SC and Geoscience Australia as a re-implementation in Java of Ken Ludwig's Squid 2.5. - please contribute your expertise!
http://cirdles.org/projects/squid/
Apache License 2.0
12 stars 27 forks source link

Expressions returning a nul result? #164

Closed NicoleRayner closed 5 years ago

NicoleRayner commented 5 years ago

In my custom expressions I have some expressions that DUPLICATE pre-existing expressions but just rename them. This was necessary given the limitations that equation=column in SQUID2. I know it won't be necessary any more with SQUID3 but there is some strange behaviour here that I would like to understand. If you examine the CUSTOM expression MF corr. 4-corr 206Pb/238U it calls on the built in expression 4-corr 206/238 an expression that duplicates an existing expression

However in the peak, it returns all 0's for the age and uncertainty. The built in expression that it calls on returns the correct values. Why? an expression that duplicates an existing expression part 2

bowring commented 5 years ago

The built-in expression is giving value for ref materials (check tabs in peek); add a "S" to the end of your expression: ... ["4-corr 206/238S"] to get the expression for unknowns

On Sat, Sep 1, 2018 at 5:10 AM NicoleRayner notifications@github.com wrote:

In my custom expressions I have some expressions that DUPLICATE pre-existing expressions but just rename them. This was necessary given the limitations that equation=column in SQUID2. I know it won't be necessary any more with SQUID3 but there is some strange behaviour here that I would like to understand. If you examine the CUSTOM expression MF corr. 4-corr 206Pb/238U it calls on the built in expression 4-corr 206/238 [image: an expression that duplicates an existing expression] https://user-images.githubusercontent.com/40035242/44931425-11f49d00-ad51-11e8-851a-fd667b68bd98.JPG

However in the peak, it returns all 0's for the age and uncertainty. The built in expression that it calls on returns the correct values. Why? [image: an expression that duplicates an existing expression part 2] https://user-images.githubusercontent.com/40035242/44931504-5ed87380-ad51-11e8-88e9-6ae9719a906f.JPG

NicoleRayner commented 5 years ago

Right. Of course.
What was adding confusion for me was that the corresponding expression related to MF-corr 206/238 AGE did not have a suffix indicating they were for unknown samples, and it worked just fine. This is because there are no comparable expressions for refmat and hence no need for a suffix. I now realize this in hindsight but it does make it difficult to troubleshoot expressions. I think this highlights the importance of being able to filter expressions for refmat (if this expression had shown up in a filtered list I would have known there was a problem). I also think it would be much better to add the suffix (_RM) to the refmats, rather than a _S suffix to unknowns. The refmats are kind of the exceptional case in expressions and so would be helpful to be flagged. Thanks for the tip.

bowring commented 5 years ago

@sbodorkos and I have planned to formalize the naming of built-in expressions and this is a good trigger to do so, and a good idea of yours as well. Stay tuned!

sbodorkos commented 5 years ago

In terms the original issue raised by @NicoleRayner I think the expression validator could potentially be enhanced in order to help with that. It is clear in the graphic that the custom expression is for unknowns ONLY, and it seems clear that Squid3 "knew" that the named expression used was applicable to the reference material ONLY - it would be nice to receive a general warning about a conflict like that during validation, as well as having the expression labelled "unhealthy" in cases where there is no overlap at all between the destination of the custom expression (e.g. unknowns) and the names of the terms used in the expression (e.g. reference materials).

@bowring the list of built-in expression names you emailed me overnight looks incomplete, and reviewing Nicole's screenshots has reminded me that the sets of built-in expressions comprising any given Task are a function of which Permutation it is (my guess is you have sent me a list generated from a "Perm1" Task). So presumably I should generate a superset of names based on my collection of Perm1–Perm4 Tasks from Korea, and account for all the duplicates that way?

I also need a reminder of why some parameters occupy only one row in your list (e.g. RU-labelled "ppmU"), whereas similar-looking parameters occupy two rows (e.g. R-labelled "ppmTh" and U-labelled "ppmThS"). Is this because there is only a single expression used to evaluate ppmU (irrespective of R/U labelling), whereas ppmTh is evaluated using different expressions, depending on whether it is R- or U-labelled? Presumably the list has one row for every distinct permutation of (expression name and expression arithmetic)?

sbodorkos commented 5 years ago

@bowring as a preliminary step, I have attached (ExpressionNames AcrossPerms_v1.xlsx) the reconciliation of the expression-names across all the Perms. I have examined all the grey gaps carefully and ensured that they do make sense. But we do have some issues that need addressing from a philosophical standpoint:

  1. This list conflates expressions for Common Pb Models, RM Isotopic Models, RM Chemical Models, Special U-Th-Pb (which in isolation was the original definition of "built-in"), and Ludwig's "under the hood" arithmetic, which is a problem because not all of these expression-types are hierarchically equivalent. So not all of these expression can be sensibly-named (I think), although resolving the hierarchy issues might help with that. I understand that this range of expression-types are probably similar in terms of requiring consistent expression-names, but if we don't categorise them in an intelligible way, there will be widespread user confusion (to add to my own confusion, because I don't yet know what some of these expression-names represent!).

  2. In the case of unknowns, I foresee the need for (at least) two different types of expression-name, to serve the dual purpose of (a) defining every single possible label-expression permutation uniquely ("unique name"), and (b) not burdening the user with unnecessary complexity that is irrelevant at best, and incorrect at worst ("usage name"). The fundamental workflow (across every Squid, ever) is to reduce the data from the Isotopic Reference Material, in conjunction with one (or more) index isotopes for the common Pb correction (in fact, SQUID 2.50 permits data-reduction of RMs only, with all unknowns ignored). As we know, Ludwig's SQUIDs required the index isotope to be specified by the user prior to data-reduction; whereas Squid3 evaluates all the sensible candidate index-isotopes for a given Perm (i.e. 4corr and 7corr across all Perms, with the option of 8corr for Perm1 only). The thing is, it remains a fundamental requirement that the index isotope for common Pb correction of the RM be selected and finalised before much meaningful data-reduction of the unknowns can be done. This means that at the moment, we are calculating suites of contextually-equivalent but mutually-exclusive expressions for the unknowns, and their "unique names" contain detail that is unnecessary in any one given data-reduction, and which will also cause confusion for users.

An example of the latter (which uses the current expression-names) is the unknown-only expression currently named ["Total 206Pb/238US"]. This expression is always index-isotope-dependent, so we have a need to be able to calculate and store ["4-corr Total 206Pb/238US"] and ["7-corr Total 206Pb/238US"] for all Perms, as well as ["8-corr Total 206Pb/238US"] in the case of Perm1. Although all of these expression-names are uniquely and anatomically correct, they are not especially helpful for actual Squid3 users, both because they are counter-intuitive (the conventional theory is that ["Total 206Pb/238US"] is not dependent on the common Pb correction, and it's still not completely clear to me why Ludwig has formulated the arithmetic the way he has), and because they can never coexist: the index isotope selected for the RM always defines ONE value of ["Total 206Pb/238US"] for the unknowns.

Therefore, from a user perspective (i.e. in terms of the "usage" expression-names displayed in the left-hand lists in Squid3), it would be much more intuitive if there existed a single expression-name ["Total 206Pb/238US"], which was essentially an alias of the specific expression for the appropriate index isotope. A complication in this particular example is that all four Perms seem to already have a third (or fourth, in Perm1) expression named ["Total 206Pb/238US"], and it is not yet clear to me whether this is already the alias I have suggested, or whether it is a different expression for which the index-isotope was not specified at the time of calculation. I need to dig a bit deeper.

At the most fundamental level, this is why I would like to segregate built-in RM expressions from built-in Unknown expressions, at least in terms of "usage names", if not "unique names". For RMs, "unique names" will dominate, because we need to rigorously prefix and display everything, so the user can see all the index-isotope value-sets and make an informed choice about which one to use. Conversely, for the Unknowns, many of the rigorous prefixes required by the "unique names" can be dispensed with, in favour of more intuitive "usage names", because the relevant expression becomes uniquely defined through the preceding choice of RM index isotope.

bowring commented 5 years ago

One clarification relative to next-to-last-paragraph:

The four expressions in rows 98-99 and 151-154 ('Total ...") ARE in fact the aliases you describe - if you change the index isotope, these aliases are updated to refer to the corresponding explicitly-corrected expression when the user changes the selected isotope in the task manager. Note that both the explicitly-corrected expression and the alias are listed as built-in expressions since the latter is defined in terms of the former.

Other existing aliases that behave in the same way are: "ppmTh", "ppmThS", "232Th/238U", "232Th/238US", "204/206 (fr. 208)", "204 overcts/sec (fr. 208)", "8-corr Primary calib const. delta%". Each gets re-defined in terms of the underlying explicitly-corrected expression when the selected isotope is changed in the task manager.

sbodorkos commented 5 years ago

@bowring Excellent, this is what I was really hoping you were going to say! I will map all those aliases out, and that will help condense the long, complex list of "unique names" into a shorter, more intuitive list of "usage names".

NicoleRayner commented 5 years ago

@sbodorkos let me know if/how I can help with this process. One thought related to naming is that the flag indicating that the expression is applied to an unknown is "U" but in the list of custom expressions their names are suffixed by "S" for samples. Obviously "U" as a suffix has problems (conflict with uranium). I'm open to ideas on consistent naming conventions across the platform, I don't have a magic solution though. Perhaps flag remains as "U" but suffix is " _Unk"?

sbodorkos commented 5 years ago

@NicoleRayner the suggestion @bowring relayed to me (which I thought was a good one, and which I also think he attributed to you!) was to use disambiguation on the Standards (rather than the Unknowns), to exploit the fact that most people are writing their Tasks and expressions to work on Unknowns (rather than Standards). At present I am tinkering with using a prefix (rather than a suffix), and using the prefix "RM" where needed. Right now I am working on a first-pass for everything, which I will post here for comment.

sbodorkos commented 5 years ago

@NicoleRayner here's what I have done so far (attached ExpressionNames AcrossPerms_v2.xlsx): this includes revisions following Skype with Jim @bowring the morning of Wed 16 January (US Eastern Time). To explain:

The 4 possible permutations (P1-P4) for U-Th-Pb geochronology (according to SQUID 2.50), are defined in Columns B-I, rows 2-4. For each of these, columns C, E, G, I contain the current (v1.0.4) Squid3 expression-names for P1-P4 as provided by Jim, and columns B, D, F, H show the "flags" for what type(s) of data the expressions relate to (C = concentration RM, R = isotopic RM, U = unknown samples). I manually "gapped" all the lists (gaps are pale grey) to ensure that equivalent expression-names always occurs on the same row across P1-P4 (and I have audited those gaps to make sure they are sensible). Jim ordered the expressions first by Flag, and second by alphabetisation: column A contains an index number that can be used to recover any disruption of the original ordering of his list.

NEW COLUMNS (J-N) Candidates I will explain later. Verbose Name is my best attempt at a UNIQUE name for each expression, for the internal use of Squid3 (noting that the questions of what should appear as "building blocks" in the Expression Manager, and what should appear as "column headers" in the report-tables are separate questions that will both need careful consideration). Data Affinity is an attempt to untangle the disparate expressions I have been asked to name according to where/how those expressions should be grouped/stored; I have placed them into 6 categories, 4 of which are Models. Type corresponds to the "single-cell" flag in SQUID 2.50; the aim is to separate expressions that work on a row-by-row basis the majority) from those that generate "summary results" (on the RM) which generate vector output (corresponding to "arrays with labels" output in SQUID 2.50). How will "summary results" be reported back to users CSV-style? Length corresponds to the fixed length of "summary" vector output (only).

CONVENTIONS Syntax: I have deleted all hyphens and periods, all pre-existing incarnations of parentheses, and all asterisks except for those used to define Model Isotopic RM values (which theoretically originate from IDTIMS data anyway). Prefix for aggregated "summary results": Several R-flagged expressions incorporate some sort of aggregation, and I have added a descriptor (Avg, BiWt, or WtdAv) to help those stand out. Hopefully self-explanatory! Correction types: I have used Uncorr, 4corr, 7corr, 8corr universally to designate "SHRIMP-style" common Pb correction-types, and where multiple corr-types are present, the hierarchy descends from the left (unique names only; I hope we never display stacked corr-types to users!) Nuclides: In isotopic ratios, I have specified numerator and denominator nuclides in full. I am open to the idea of dropping the element-symbols to make the names shorter (it probably would not decrease clarity); a question would be whether people are happy for element-labels to be missing in instances where the nuclide is NOT part of a ratio. For example, is "4corr Common 206 (%)" an acceptable alternative to "4corr Common 206Pb (%)"? Units: Where the output of an expression is NOT dimensionless, I have suffixed the expression-name with its units in parentheses (these being the units that users will want to work with, not the units of the values Jim is actually storing!). I don't think these are superfluous: indicators like "(Ma)" or "(ppm)" might be self-explanatory, but I think a lot of the uses of "(%)" are informative. Suffix: I did play with prefixes, but they were messy, so I have reverted to suffixes. But I have applied them to R-flagged expressions (in the form "RM"), rather than unknowns, in the hope that this will make expression-names easier to use in subsequent expressions. Yellow-coloured expressions are multi-context: the curly braces are intended to indicate that the suffix RM is used when wishing to refer to the relevant R-flagged expression, whereas no suffix is used when referring to the U-flagged one.

CATEGORY RU The majority (20) of the 27 expressions labelled "RU" really pertain to data that should be held as part of Models (of various kinds: Concentration, RM, Isotopic RM, Default Common Pb, Physical Constants). Quite a few of them don't involve expressions as such: things like Concentration RM parameters and Physical Constants parameters are just numeric values with units.

The remaining 7 are all truly multi-context "built-in" expressions, so I have assigned names with an optional suffix as per above description of curly braces. These multi-context expressions have been retained, in preference to duplicating the expression as a single R-flagged expression plus a single U-flagged expression.

CATEGORY R In general, I have on applied the suffix RM where it is needed.

Expression-names with white text on black background denote ALIASES as used in Squid3 v1.0.4, the exact expression for which is dependent on the index isotope chosen for the common Pb correction. The "Candidate" column contains a comma-separated list of Index numbers (i.e. column A) of longer/verbose/unique expression-names that are potential candidates for that alias. Note that many of the aliases apply only to P2 and P4, but some apply across all of P1-P4. Candidate index-numbers in square brackets apply to P1 only; all others apply to every white-on-black expression-name on that row. I think and hope this alias-list (and alias-mapping) is complete for Squid3 v1.0.4, but I guess we will need to think more widely about aliases that might be desirable.

Note also that R-flagged expression-names essentially have two states: (1) "RM assessment", where pretty much every expression needs to be identified by its full unique name, prior to any choice being made about index isotope for common-Pb correction, and (2) "RM finalised", where that choice has been made, and useful aliases can be deployed fro both the RM and the unknown on that basis. I think we need to think carefully about whether we have a need to report state (1) to users in any form (i.e. application interface, or CSV-type report). Conceivably the answer is no: at present, one of the two(/three) radio-buttons 204corr/207corr(/208corr) is always selected, so it's possible that state (2) is all we need to show or report.

CATEGORY U These are a bit simpler, and hopefully self-explanatory. Aliases as per category R.

TO DO Brief definitions of each expression. I already have a fair bit of applicable material in hand (having written definitions of Oracle database fields designed to capture data from SQUID 2.50 workbooks), but there is some extra work to do.

Anyway, let me know what you think.

bowring commented 5 years ago

Great progress! I have a few initial considerations:

1) In keeping with the use of the other parameter model types, we should refer to "Common ..." and not "Default Common .."

2) We should not use (Ma) for Lambdas and Ages in the builtin expressions - I have gone to great lengths to make sure the underlying math follows the principles of separation of 'view' and 'model = data' so that the data is agnostic of units other than ground truth (annum). The place for (Ma) is in configuring the reports (as in ET_Redux), where the user selects the units for display of the underlying values - this functionality has not yet arrived in Squid3. The parameter model architecture inherited from ET_Redux follows this convention as well. All uses of lambda in the existing built-in expressions refer to lambda in annum and all of the underlying math in Ludwig's procedures has likewise been modified to work with annum. The use of (Ma)- tagged Lambdas will result in the existence of two sets of built-in lambdas. Thus the definition of ["Lambda 230Th (PerMa)"] will be ["lambda230"] / 1000000 and should really be considered a "custom expression" as it will only be used in custom expressions. Likewise, all calculations involving Age are designed to handle annum. If the user wants to view (Ma), they can specify the units in the report generator (yet to be implemented). The general idea of the report-customization engine will include the ability to create and display 'custom expressions' of any kind and in any units.

3) I am not sure there is an elegant solution, but "WtdAv Xcorr 206Pb/238U Calib Const" will not fall alphabetically with the other Xcorr expressions.

4) The {Contextual Suffix} = {RM} is not necessary because of the expression architecture: An expression knows the type of its valid targets (RM and/or U) and when invoked on an invalid target, will not yield a result. These yellow-highlighted expressions yield a result regardless of the type of target provided (RM, U) and are thus inherently context-sensitive and the user does not need to worry about it.

5) I have left out of this list the hard-coded parameters that appear at the bottom of the task audit and probably need better names: ExtPErr = 0.75 L1033 = 1.033 L859 = 0.859 r238_235s = 137.88 We should also decide if any of these parameters belong in our models.

5) Should we worry at this juncture about providing short explanations for each expression or simply rely on the the new verbose naming scheme for that info?

Cheers, Jim

cwmagee commented 5 years ago

Re: 3: Can you customize your alphabetizer to ignore "WtdAv" in the same way that many modern alphabetizers ignore "the" and "a"?

sbodorkos commented 5 years ago

@bowring responses based on our Skype on Wed 23 Jan:

  1. Let's go with Assumed_Common, or even just Assumed, at least until I document the Grouping-specific common Pb calculations. Once you have those in hand, I suspect a naming solution will present itself.

  2. As we discussed, the scale of the units matter (for entities that have them), but dealing with them via context-specifier (i.e User Preference-style setting that states preferred time-unit is Ma, concentration unit = ppm, etc.) is the best solution, so I will drop the units of measure from the expression-names. But given that the expression-naming and implementation of the context specification will be separate pieces of work, @NicoleRayner should probably be aware that during the earliest stages of testing of Custom expressions, there will be a brief but exciting time in which expression-names which have an associated unit of measure will be available only in their "ground-truth" units (i.e. ages in annum, decay constants in per annum, etc.). Just a temporary thing until @bowring implements the context-specifier, which will enable users to state a preference for working in "Ma mode" for arguments in custom expressions, and data-reporting.

  3. This is not simple, but nor is it a critically important issue. The main imperative is to have these expression-names easily discoverable, given that there are only 7 of them in a list of 140-odd. SQUID 2.50 does it with formatting and data-placement (because it is not restricted to strictly rectangular CSV arrays) - these "summary" values are shown at the foot of the column they are derived from, in bold font of a different colour, and (mostly) with a caption. As discussed, in Squid3 the solution is probably to give "built-in Summary" expressions their own category in the Expression Manager.

  4. Your solution is not a complete one in terms of the arithmetical expression people might theoretically want to write, because ambiguity is possible... but I confess that I can't think of a real-world example of an expression where the default assumption would be wrong (e.g. in a Custom expression tagged U, any reference to the row-by-row RU-tagged expression ["ppmU"] would naturally be referring to that expression as applied to the unknowns, rather than the reference materials). It might well be that a U-tagged Custom expression does refer to an R-tagged Custom expression e.g.: DiffU = ["ppmU"] - AvgRMppmU where Custom expression DiffU is U-tagged, but Custom expression AvgRMppmU is R-tagged and defined as: AvgRMppmU = average(["ppmU"]) However, in examples of this type, there is no ambiguity in the expression-names. So let's leave things as they are.

  5. We didn't discuss these in detail; please note that ExtPerr belongs in User-preferences! It's not an immutable magic number; just a value inherited from GA that has presumably been hard-coded (temporarily) for convenience. Different user-groups will want different values: RSES people will want 1.00, for example. And r238_235s corresponds exactly to the (model) physical constant ["Present-day 238U/235U"] and can be substituted on that basis. But for the remaining two, I'll need to revise the physical basis of those numbers and supply that to you, because we probably aren't very far way from substituting those as well. The main reason we have retained them was to maintain maximum arithmetical comparability with SQUID 2.50, and we have not yet seen any issues on that score...

  6. We'll definitely need definitions for all the unique-names Squid3 uses, purely because some of the names are illogical or nonsensical as written (although the names are justified by the underlying arithmetic). Proper definitions will also help me (and my successors) remember the nature of single/dual-calibration dependency (and index isotope dependency) for some of the expressions, and definitions will also maximise our chances of mapping "candidate" unique expression-names to "user-aliased" expression-names correctly and completely.

bowring commented 5 years ago

@sbodorkos - Good summary. Note that in 5 we should consider whether the names need refining or not.

bowring commented 5 years ago

@sbodorkos - Another important consideration in naming - at least for parameters - is that if the names do NOT have spaces in them, but rather are similar to "Lambda_238Th", for example, they can be used directly in expressions without the need for '["xyz"]'. Another alternative is CamelCase.

sbodorkos commented 5 years ago

@bowring here ( [ExpressionNames AcrossPerms_v3.xlsx] (https://github.com/CIRDLES/Squid/files/2817268/ExpressionNames.AcrossPerms_v3.xlsx) ) is the next iteration of the UNIQUE expression-names (column L of the spreadsheet; in "Darwin case"; who knew all this stuff had names?).

As you will know from the definition, I have combined Title Case with underscores, with no spaces, slashes or other punctuation marks of any kind. Note that these would be a superset of what's available for users to manipulate via Expression Manager: in that environment, we will want to take full advantage of the aliases we have defined, in order to present a shorter list of more intuitive expression-names, with their definitions always uniquely specified by the combination of Perm-type and the "index isotope" radio-button setting. A couple of specifics:

  1. I have used "_Concen" to denote expressions that give results that traditionally have units of ppm. Jim does not want units in expression-names; "_Concen" could be dropped everywhere it is used, but it would result in some very short expression-names e.g. the traditional "ppmU" would become simply "U". I am happy to go that way if others are.
  2. "Default" Common Pb: After consideration, I have retained this interim naming convention. Jim and I discussed that "Assumed" would probably be a better description than "Default" with respect to the common Pb isotopic composition applied to the isotopic RM in the StandardData sheet (and also in the first iteration of the unknowns, as they appear on the SampleData sheet prior to Grouping). After all, I couldn't think of a contraction of Assumed Common other than "AssCom", and I just had a feeling that wasn't gunna fly...

@NicoleRayner please feel free to comment on the conventions floated here, especially if any of them are jarring, because whatever we decide here for the purpose of unique definiton of expression-names in the code is also going to form the vast majority of the basis for naming conventions in the Expression Manager. Some of it is clunky: "ParentElement" is necessary because SQUID 2.50 (and therefore Squid3) do support element concentration normalisation using Th in preference to U (if you are adventurous) - I have long thought it was a just a "fake" part of the Reduce Data form that was not really supported by the code, but it turns out that it really works! And I have retained Ludwig's old "CalibConst Delta%" variable-names from the StandardData sheet: in fact, that calculation defies meaningful description, as there is no formal link to the calibration constants anywhere in it. Certainly the static reference to "206Pb/238U CalibConst" is not strictly correct, as the parameter is calculated even when you do Th-Pb geochronology with U only determined indirectly (i.e. "Perm 3", which is the exact opposite of "normal" U-Pb zicon geochronology). Have a look at the formulae in those columns in a normal SQUID 2.50 workbook, and if you can think of a sensible "column-header" for it, let me know!

Next thing to do is to define the subset of these names that will be available for use in the Expression Manager, and after that, descriptive row-by-row definitions that encompass: (1) the "isotopic" defintion, (2) the Perm-dependence of the evaluation if applicable, and (3) whether the expression appears under its "real" name or an alias, and if the latter, what its alias would be. All of this stuff really matters, because it's hard enough to keep it straight in your head even when it's "fresh"; it would be bad enough returning to this stuff after a hiatus, and heaven help any fresh pair of eyes!

But none of that is going to happen immediately: right now I am packing up the kids for a long weekend at the beach, because they feel like they haven't done quite enough lazing around during their 7 weeks of summer holidays, which end on Tuesday... that's when I'll be back.

sbodorkos commented 5 years ago

...urgh, and I have forgotten to add in definitions (or replacements) for magic numbers L859 and L1033. But I do have them, somewhere safe...

NicoleRayner commented 5 years ago

@sbodorkos enjoy your time at the beach. I will be on pretty much the exact opposite side of the earth, just shy of the Arctic circle at latitude 66°N for the week. I'll bring this stuff with me and have a look and try and get back to you when I return (on the 8th). https://en.wikipedia.org/wiki/Pangnirtung

bowring commented 5 years ago

I have completed the transformation and am attaching four text files (one per perm) as before listing the new expression names alphabetically by target flag (C,R,U) for spell-checking, confirmation, and as input to the process of deciding which expression names will be featured for the user - such as the alases. I envision two "Builtin" tabs in the expression manager - one for these featured expressions and the other for the supporting expressions. perm3.txt perm4.txt perm2.txt perm1.txt

bowring commented 5 years ago

We should come up with a proper name for "0.9678" found in Th_Concen_RM, so that it is exposed to the user. We also need to decide how to handle "137.88" - is it a preference or part of the reference material model or ... ?

sbodorkos commented 5 years ago

@bowring I really hope 0.9678 is exactly the same as 1/1.033, because it doesn't ring a bell! But I will sort it out.

137.88 is a candidate value for a Physical Constant that I have been calling ["Present 238U/235U"] (or ["Present-day 238U/235U"], can't remember exactly which) in the wiki. It belongs in the same category as all the Lambdas. In theory, all of the Physical Constants transcend geochronology, and each has a single value overwhelmingly agreed by the entire scientific community. In practice, some Physical Constants have two (or more) candidate values, usually arising from some unsatisfactory aspect of the original measurement (or documentation thereof).

For example, Lambda238 was well measured by Jaffey et al. (1971), and their value (1.55125e-10 per annum) is universally used and uncontested. However, the measurements of Lambda235 they did at the same time were dispersed beyond their analytical uncertainties, and based on "closed-system" geochronology, Schoene et al. (2006) proposed an alternative Lambda235 value more consistent with true concordance of (purportedly) undisturbed zircons.

However, the Schoene et al. (2006) value for Lambda235 depended on a "reference" value for ["Present 238U/235U"] that ultimately proved even more contentious. The value of 137.88 widely used by Ludwig (often as a "magic number" rather than a reference to a variable) was proposed/ratified by Steiger and Jaeger (1976), but that value was based on an undocumented dataset that appears to have been lost since! Hiess et al. (2012) presented a compelling case for an alternate value of 137.818, based on a wide range of careful experiments, but a lot of geochronologists continue to use 137.88, because consistency with calculations performed in previous years is more important to them than the absolute truth (I do this myself!).

We will be adding more Physical Constants to this set, because they are what underpins at least some of the remaining "magic numbers". L1033 is one example; that value is built on a Physical Constant which can be expressed as ["Present-day 238U"] / ["Present-day Total U"] = 0.9927 (i.e. 238U makes up 99.27% of all natural U on the present-day Earth). I believe this particular value (0.9927) is the cornerstone of all Ludwig's magic numbers, and is the specific reason that all the magic numbers have exactly 4 significant figures. Note also that 0.9927 is very close to (1 - [1/137.88]).

sbodorkos commented 5 years ago

@bowring there is a minor flaw in our audit of expression-names: Perms 1-4 span only the permutations relevant to the Isotopic RM (i.e. primary parent-daughter for calibration, and whether the secondary parent-daughter is calibrated directly or indirectly), and does not cover the extra permutations resulting from varying the "primary parent" of the Concentration RM. The artefact of this that we see in our audit-list is the apparent universality of "U_Concen" (flagged RU), in contrast to Th_Concen_RM (flagged R) and Th_Concen (flagged U).

If Perms 1-4 had been run using a Concentration RM defined by a reference Th (rather than U) value, this pattern would presumably be reversed (i.e. Th_Concen would be apparently universal, and flagged RU, whereas U_Concen_RM [flagged R] and U_Concen [flagged U] would be distinct).

I am happy to run an example to confirm this, but I suspect it means that in order to rigorously represent all the possibilities in "Unique Name" space (for Squid3 internal use and disambiguation), it will be necessary to have 4 separate expressions: Th_Concen_RM and U_Concen_RM (both flagged R), and Th_Concen and U_Concen (both flagged U). One of the element-pairs of expressions will essentially function as a single RU-flagged expression, but we won't know which until the Concentration RM (and its "primary" element-value) is selected by the user.

bowring commented 5 years ago

from the wiki: The "magic number" 0.9678 (4 decimal places) is hard-coded into the SQUID 2.50 code, which is a bit naughty because it actually is actually the product of several physical "constants" related to the masses and isotopic abundances of Th and U. Essentially it represents the product: ("Atomic mass of 232Th"/"Atomic mass of 238U") * ( ("238U"/"AllNaturalU") / ("232Th"/"AllNaturalTh") )

The atomic masses of 232Th and 238U are immutable physical properties: 238U comprises 92 protons and 146 neutrons by definition, so its atomic mass is 238 by definition. Similarly, 232Th is defined as the isotope containing 90 protons and 142 neutrons, so its atomic mass of 232 is an intrinsic property.

However, the ratios ("238U"/"AllNaturalU") and ("232Th"/"AllNaturalTh") have more scope for variation, and should certainly be modelled separately. Having scoured the internet and the literature, I believe Ludwig explicitly assumed exact values (4 decimal places) of 0.9928 for the former and 1.0000 for the latter. Both are perfectly reasonable values, but ought to be identified as explicit physical "constants" constraining the arithmetic. So the magic number 0.9678 represents the product: (232/238) * (0.9928/1.0000) rounded to 4 decimal places. For the present, we ought to retain the hard-coded "magic number" (0.9678).

sbodorkos commented 5 years ago

@bowring well, nothing is ever simple. I ran a "Perm1-ThConc" iteration in SQUID 2.50, and saw what I expected to see; Th (ppm) calculated universally, and U (ppm) calculated via row-by-row expression, involving ["ppmTh"], ["232Th/238U"] which remains explicitly evaluated by EqNum -3 as is usual in Perm1, and magic number L1033. So far, so good.

So I derived a "Perm2-ThConc" Task from my "Perm1-ThConc" Task, and set it going. Bloody thing didn't work! Remember that in Perm2, ["232Th/238U"] is evaluated by ratioing the TWO calibration constants you have (so it depends on the common Pb index isotope); and in this scenarion, the SQUID 2.50 code seems to believe that EqNum -4 controls URANIUM concentration specifically (i.e. never Thorium). Which is a problem when you are using a Th-centric expression in EqNum -4. So basically SQUID 2.50 returns a bunch of blanks for ["ppmU"], as a consequence of my EqNum -4 being completely Th-centric, and SQUID 2.50 then tries to derive ["ppmTh"] from ["ppmU"] in the fashion that we are familiar with , based on our "normal" Perm2 Task (i.e. using ["ppmU"], ["232Th/238U"] and L09678), but it fails: ppmTh is always zero because ppmU is blank.

I regard this as a bug in SQUID 2.50; the code is clearly not properly configured to deal with "primary" ppmTh in combination with dual-calibration Tasks (I did perform the fill-in tests: Perm3-ThConc functions correctly analogous to Perm1-ThConc, and Perm4-ThConc fails in exactly the same way as Perm2-ThConc does).

I suggest we park that for the moment, at least until Squid3 permits the Concentration RM value to be keyed to Th content by users (rather than being hard-wired to U as at present). When that happens, we ought to be able to run the Perm1-ThConc to Perm4-ThConc tests, and we should see the same failures as SQUID 2.50 gives. At that point, we should be able to debug Squid3 (and I'll probably debug SQUID 2.50 in parallel, just so I am sure what is going on).

And I still believe my original supposition to be correct, notwithstanding the bloody bug: we will need to replace the single RU-flagged expression I have provisionally Unique-Named "U_Concen", with two expressions: one R-flagged, and which ought to have the provisional Unique Name "U_Concen_RM", and the other U-flagged, which should ultimately retain the original Unique Name "U_Concen" (although it might be wise to employ an interim/staging name like "U_Concen_UNK" during the Find-and-Replace stage, and then convert all those interim names to the final name afterwards, once the U_Concen_RM occurrences have been successfully separated).

bowring commented 5 years ago

@sbodorkos - I am on it! Also, note that the official name for our 238U235U friend is "Present_238U235U" unless you want something different.

sbodorkos commented 5 years ago

@bowring after our Skype of 13 February, I redid my audit of the text-files you supplied in this thread on 3 February, and can confirm that all the "white-text" issues were of my own making – apologies for the unnecessary alarm! The attached file ExpressionNames AcrossPerms_v4.xlsx now has two worksheets:

Perm1-4 Expr Names

Column K can be ignored as it is out of date; column L contains the unique names, unchanged from the previous iteration of the spreadsheet (v3).

I have inserted a new column M that maps the aliases that should be used in the Expression Manager, both to reduce the number of expression-names that need to be handled in those lists, and to ensure that users do not get access to expression-names that are inapplicable to their choice of index isotope. Note that there are no new "User Names" in this list; it's simply a mapping of which set of "Unique Names" should be available for Expression Manager use, and how the larger set of underlying Unique Names should be mapped in a sensible way for humans to use, for their chosen combination of Perm-number and index isotope for common Pb correction.

I have also inserted a new column N, which contains a "Definition Note" for each Unique Name. There is not a lot of "isotopic" detail in these Notes, but I have endeavoured to describe the Squid3-specific relationships in plain English (both as a reminder to us, and to give interested Squid3 users some insight). I have also tried to structure the Notes in a consistent way; I think they will serve, for the moment. Perhaps @NicoleRayner could browse them and advise any glaring errors or omissions.

Yellow highlight details the required changes for our original RU-flagged expression "ppmU" (i.e. "U_Concen"), remembering that that original expression is predicated on the Concentration RM always being defined in terms of U content. In order to accommodate the possibility that users might want to "derive" U contents based on measurements of a Concentration RM defined in terms of its Th content, it is necessary for us to duplicate our current treatment of "Th_Concen[_RM]" for the proposed "U_Concen[_RM]". In practice, this means replacing the RU-flagged expression (index 11; see column A) with separate R-flagged and U-flagged expression.

In RMs, "Perm1-Th" and "Perm3-Th" Tasks (i.e. Tasks that are Perm1 or Perm3 in all "isotopic" respects but which use a Th-centric Concentration RM), the R-flagged expression "U_Concen_RM" remains uniquely defined and independent of index isotope for common Pb correction (see index 11A). In contrast, the corresponding R-flagged expression in "Perm2-Th" and "Perm4-Th" Tasks does depend on the choice of index isotope, which gives rise to new expressions with Unique Names "4cor_U_Concen_RM" (index 11B) and "7cor_U_Concen_RM" (index 11C). Thus for Perm2-Th and Perm4-Th, the choice of index isotope will dictate which expression (out of 11B and 11C) is aliased to expression 11A.

_[The SQUID 2.50 bug I have referred to before essentially relates to the implementation of "4cor_U_Concen_RM" (index 11B) and "7cor_U_Concen_RM" (index 11C); those don't work in SQUID 2.50, even though their Th-based equivalents "4cor_Th_Concen_RM" (index 51) and "7cor_Th_ConcenRM" (index 70) do work. I see no good "isotopic" reason for this; it is much more likely to be a sloppy piece of code that explicitly and incorrectly assumes all Concentration RMs are U-centric. Repairs should be simple enough when the culprit is identified, but there is no rush to do that.]

Thankfully the situation is a bit simpler for Samples. Once the R-flagged expression "U_Concen_RM" expression (index 11A) is resolved, the U-flagged expression "U_Concen" (index 11D) is uniquely defined and independent of index isotope, for all Perms.

Bowring164_2019-02-03

This sheet contains the row-levelled results of the audit against your text-files of 3 February. Columns A and B are copied from columns A and L on the other worksheet. Columns C-J contain your text-files, re-alphabetised as needed, and with part-rows inserted to check your expression-names directly against the comprehensive Unique Name list.

The two orange rows near the top denote the editing mistake we discussed via Skype: those two expressions both have the correct names, but are both incorrectly flagged RU in Perm3. The one suffixed "_RM" should be R-flagged, and assigned to Perm3 index 33; the other should be U-flagged and assigned to Perm3 index 104. These assignments have been coloured orange at their destinations as well

The yellow fill denotes the relevant portions (for each Perm) of the old expression index 11 and its four replacements 11A-11D, as described above.

The pale green fill denotes simple, symmetrical vacancies in the Perm1 and Perm3 expression-name lists, relative to the Perm2 and Perm4 ones. The darker green denotes more complex (or non-symmetric gaps), and the reordering of the list to alphabetise the new Unique Names has revealed a (long-standing) inconsistency in the Perm3 expressions. I have coloured the relevant pair (index 37 and 59) red.

Based on your lists, it looks as though Perm3 Tasks do calculate the R-flagged expression "7cor_206Pb238U_Age_RM" when the index isotope is 207Pb, but NOT the corresponding "4cor_206Pb238U_Age_RM" when the index isotope is 204Pb. Could you please verify that this inconsistency reflects the truth of what Squid3 actually does? (I assume it is, and I can probably do the test myself after the next release anyway.)

Assuming it's true, I need to decide what the proper resolution is, because there are two ways it can be played:

  1. Add code to calculate 4cor_206Pb238U_Age_RM for Perm3 (because all other Perms have it, and because Perm3 does have the underlying 4cor_206Pb238U_RM ratio data needed to perform the calculation).
  2. Delete the code that calculates 7cor_206Pb238U_Age_RM for Perm3 (because Perm3 is single-calibration like Perm1, and the "indirect" daughter-parent pair in Perm1 (208Pb232Th) is ignored for ..._Age_RM purposes, so the indirect daughter-parent pair in Perm3 (206Pb238U) should also be ignored).

On one hand, option 2 is probably the "scientifically" correct one (analysts using Perm1 Tasks are doing so because 208Pb/232Th is irrelevant or unsuitable for them; if they have a scientific interest in 208Pb/232Th as well as 206Pb/238U, they would be using Perm2 or Perm4 Tasks... and the analogous logic can sensibly be applied to Perm3 with respect to 206Pb/238U).

In the real world, however, as revealed by a re-examination of the wiki, the true reason that we calculate 7cor_206Pb238U_Age_RM as universally as possible, is because we use that value to derive the "underlying" ratio 7cor_206Pb238U_RM! We can't get that ratio unless we have the age, so I guess for some measure of consistency, we should revisit the Squid3 code with a view to implementing option 1 for Perm3.

Next up: first-principles descriptions of the magic numbers L1.033, L0.9678, and L0.859...

bowring commented 5 years ago

some comments for @sbodorkos -

Perm1-4 Expr Names: U_Concen_RM - these changes have been made with the assumption that, for now, the definitions of 4cor, 7cor, and not-cor U_Concen_RM are identical until you supply the definitions. However, we need to address how the report columns will be modified to accommodate these changes to U_Concen - much like the parallel Th_Concen_RM columns. I note in passing that the Th ppm columns in the report are labelled as 'Ncorr', but appear in the 'Correction-Independent Built-in' Category of the reference materials Squid2 report in Squid3, which now seems a little confusing to this non-geologist. Please advise.

Bowring164_2019-02-03: . Your comment about "4cor_206Pb238U_Age_RM" is confirmed in the Squid3 implementation of your VBA code in 'SQUID 2.50 Sub: StdRadiogenicCols' in the wiki, starting with your comment 'Before resuming the SQUID 2.50 code, I have inserted some code to calculate ...'.

You can find this code transformed to Java in Squid3 in the method 'public static SortedSet stdRadiogenicCols(String parentNuclide, boolean isDirectAltPD)' in BuiltInExpressionsFactory.java, here, line 1157 for the case of perm3 with only "7cor_206Pb238U_Age_RM" calculated at line 1178.

Note that the method "public static SortedSet generate204207MeansAndAgesForRefMaterialsU()" at line 778 does calculate both ages BUT only for perm1,2,4.

Please advise.

I am addressing the remaining items. It seems we are almost done with this issue!

NicoleRayner commented 5 years ago

for @sbodorkos I had a look through the notes of the v4 expression names spreadsheets and have a small number of questions/suggestions shown as notes in the attached. Copy of ExpressionNames.AcrossPerms_v4_NR comments.xlsx

Looking forward to the next release! Thanks for all your efforts fellas!

sbodorkos commented 5 years ago

@bowring I am working my way through the various loose ends.

Magic Numbers: I have finally worked my way through the arithmetic, eliminated references to "derived" values, and based the three magic numbers from SQUID 2.50 exclusively on:

  1. Three ("NukeMass...") Physical Constants that genuinely transcend geochronology i.e. rigorously analogous to decay constant ("Lambda...") values, and
  2. One Isotopic RM-specific parameter, still provisionally named Present_238U235U, and for which SQUID 2.50 explicitly assumes a universally constant value of 137.88.

This exercise has reinforced the data-processing "truth" of Present_238U235U as an Isotopic RM-specific parameter that is propagated to the Unknowns (as per Noah's definition). It really should be named "Ref_238U235U" to emphasise its behavioural similarity to all the other ratios defined for an Isotopic RM (Definition Note = "Reference 238U/235U of RM", Data Affinity = "Model: Isotopic RM", Type = "Vector", Length = "1"), even though it is non-radiogenic.

Anyway, the magic numbers have all been derived using the arithmetic in the attached spreadsheet:

Squid3_MagicNumbers_Explained.xlsx

It's probably a good idea to keep the expression-names for the magic numbers (so I have added a Definition Note for each), and simply implement the arithmetic for each value, with no rounding of the result (unlike SQUID 2.50). It will help with transparency for the next time we forget where the magic numbers come from!

Expression Names: Next step is to look at possible changes to expression-names as per @NicoleRayner comments. I will follow that with a Perm3-specific solution to the 4cor... vs 7cor... 206Pb238U_Age_RM inconsistency, so we can put that one to bed.

But then I will need to debug the "SecondaryParent..." code from SQUID 2.50 in order to supply the specifics of U_Concen_RM vs 4cor_U_Concen_RM vs 7cor_U_Concen_RM. No big deal; it has to be done sometime, and it might as well be now.

Reports: It is clear that we need to revisit the categorisation of the various expression, now that we have a rigorous understanding of the full reach of the index isotope. It would appear that many of our "correction-independent" parameters actually instead reflect exclusivity (i.e. their values are uniquely defined once the index isotope has been selected, therefore they can be portrayed as correction-independent, but that is not the reality embodied in our set of Unique expression-names).

We will need to make a philosophical decision about what we want the Reports to actually show, with particular respect to the RM. For the RM, we have two options (and we could take both...):

  1. A comprehensive version, displaying all Unique Names and their calculated values, with no assumption regarding selection of index isotope, and no use of aliases, and/or:
  2. A tailored version, showing only the data relevant to the selected index isotope, and with aliases used

There is merit to both; the first might help you make a decision regarding WHICH index isotope to use, and the second removes a lot of extraneous clutter after the index isotope has been chosen, as well as providing the baseline for the Unknowns.

For Unknowns, you really can't look at the data until you have made a decision on the index isotope for the RM, and the distinction between "comprehensive" and "tailored" is almost non-existent, so it's a much easier decision to make for those.

Sorry this has taken all bloody day, see you on Skype in 30 minutes...

sbodorkos commented 5 years ago

@bowring response to @NicoleRayner feeback in Copy.of.ExpressionNamesAcrossPerms_v4_NRcomments:

Index 2-4: These expression are combined RM and Sample spots because SQUID 2.50 (and therefore Squid3) never permits divergence of StandardData vs SampleData arithmetic, no matter which Perm you use, or which element is the primary concentration element. In detail, it turns out there are very, very few expressions for which this holds true universally, and these three are the main ones. Another two are the NU-switched outputs of the "equations for Pb/U [Pb/Th] normalization" as specified in the Special U-Th-Pb Equations window of SQUID 2.50's Task Manager, but neither of these expressions is mandatory across all Perms. A sixth is the NU-switched output of the Primary Parent Element expression ("uranium concentration") as specified in the Special U-Th-Pb Equations window of SQUID 2.50's Task Manager, but this is not mandatory, and this particular parameter has a very low profile in SQUID 2.50: it does appear (mislabelled "ppmU") on the Within-Spot Ratios worksheet, but it does not appear anywhere on either the StandardData or SampleData sheets.

Index 31, 32, 35, 57: Yes... I had intended to keep the Unique Names as short as I could, and the relevant SQUID 2.50 column-headers do not make specific mention of overcounts... but I think the change is worth making. @bowring I have changed the four Unique Names (and in the case of 31 and 32, the User Names too), as follows (the relevant cells are coloured grey in columns L and M of attached spreadsheet ExpressionNames.AcrossPerms_v5_NR+SB.xlsx ):

Index31. 204Pb206Pb_From207Pb is now 204Pb206Pb_OvCtCorFrom207Pb (for both Names) Index32. 204Pb206Pb_From208Pb is now 204Pb206Pb_OvCtCorFrom208Pb (for both Names) Index35. 4cor_204Pb206Pb_From208Pb is now 204Pb206Pb_OvCtCorFrom208Pb (Unique Name only) Index57. 7cor_204Pb206Pb_From208Pb is now 204Pb206Pb_OvCtCorFrom208Pb (Unique Name only)

Definition Notes: In addition to the one pointed out by @NicoleRayner where some of the Notes for the Unknowns needed to make specific reference to the choice of index isotope for the RM, I have updated the full range of Definition Notes that contained typos, inconsistent formats, or non-contextual references to "Perm X". There are a total of 25 Notes that have been updated (coloured grey in column N in attached).

4cor_206Pb238U_Age_RM vs 7cor_206Pb238_Age_RM: I have reviewed the code in Java, and also in the wiki, and I really have no understanding of it! It doesn't look like I documented the critical code-segment directly: I did it for the Unknowns and then wrote a note along the lines of "this should be implemented for the RM as well".

It has occurred to me that perhaps the entire edifice of 206Pb238U-related data in Perm3 is nonsense, based on my comments in the early part of Sub StdRadiogenicCols. I will have another look at the output after the next release of Squid3, and try to decide what (if anything) to do about it.

bowring commented 5 years ago

@sbodorkos - quick question: what is the scientific difference between the integer masses you define and the atomic molar masses we show in the physical constants models?

amm
bowring commented 5 years ago

@sbodorkos - Also, are we going to rename "ExtPErr"?

sbodorkos commented 5 years ago

@bowring the difference is essentially nuclear binding energy, which can be expressed as mass via E= mc^2. In this form, they are referred to as "nuclide mass defects". In your list, the Th and U nuclides with masses 230 and above have positive mass defects, which means that their binding energies are negative, and they will eventually fall apart by themselves (i.e. via radioactive decay). In contrast, the Pb nuclides (masses 204-208) all have negative mass defects, which means their binding energies are positive (i.e. the configuration of protons and neutrons in the nucleus is energetically favourable), and those nuclides are stable. There's obviously a lot more physics to it than that, and I am not across the detail, but you can do more reading at https://cinty-lee.squarespace.com/s/Chapter-2-Nuclear-Structure.pdf or on Wikipedia under "Nuclear binding energy".

I suspect what you really want to know is whether my impromptu "NukeMass" physical constants can be replaced by your list of Atomic Molar Masses. I suspect the answer is "yes", and I have attached an extended version of the MagicNumber explainer (Squid3_MagicNumbers_Explained_v2.xlsx) that investigates the effect of using the numbers in your screenshot, rather than the integers. L859 and L1033 are unchanged at Ludwig's level of rounding; it looks like L9678 becomes L9677 if the atomic molar masses are used and Ludwig's rounding were performed, but I am not too worried about that.

Basically, it is not clear to me whether Ludwig used integer-masses because (1) he considered the difference between the atomic molar mass and the integer to be negligible, or (2) integers are isotopically correct usage. My suspicion is (1) is true anyway, but even if it were (2), I think any "inaccuracy" we introduced would be worthwhile in data-modelling terms (i.e. we wouldn't need to define and explain a second set of numbers that look like atomic molar masses but are actually integer "true masses").

So, there you go, I have convinced myself. Feel free to substitute your Atomic Molar Masses for my integer "NukeMass" values.

sbodorkos commented 5 years ago

@bowring nothing is ever simple...

Re ExtPerr: There are two different ways we can go with this, and I suggest Jim makes the call, in terms of the implementation that will be easiest to modify later if required. But I would also appreciate advice from @NicoleRayner :

Single "global" user-preference value (as per SQUID 2.50): This would be independent of daughter-parent system, and could have a Unique Name like "Minimum_Ext1SigmaErr_Pct"). The broad generalisation of this Definition makes the associated Note a bit unwieldy; it would be something like:

"User-defined minimum value for external (spot-to-spot) 1sigma uncertainty (expressed as a percentage), intended to supersede any smaller value of WtdAv_Xcor_DaughterParent_CalibConst[2] (i.e. the third element of the WtdAv vector output) calculated from any RM dataset of WtdAv_Xcor_DaughterParent_CalibConst values. In this context, Xcor denotes the index isotope used for the common Pb correction in the RM (i.e. 204Pb, 207Pb, or in the case of U-Pb Tasks that calculate only a single 206Pb/238U calibration (i.e. "Perm1"), possibly 208Pb), and DaughterParent denotes the relevant isotopic system (i.e. 206Pb/238U or 208Pb/232Th)".

If I have the relationships correct, this would mean that all Expression-Names of the form "Xcor_DaughterParent_Ext1SigmaErr_Pct" would be governed by an expression of the form:

= MAX( WtdAv_Xcor_DaughterParent_CalibConst[2], Minimum_Ext1SigmaErr_Pct )

A drawback of this first option is that it couldn't cater for different values depending on the relevant daughter-parent system (although SQUID 2.50 does not offer this flexibility either). I am not sure how important this would ever be, but it might be prudent to accommodate it now, while it would be "easy"...

Daughter-Parent specific user-preference values: There would be two of these, with Unique Names like "Minimum_206Pb238U_Ext1SigmaErr_Pct" and "Minimum_208Pb232Th_Ext1SigmaErr_Pct"). In Squid3, I envisage them defaulting to the same value, unless the user manually defined different values. The associated Notes are a bit easier to handle:

Minimum_206Pb238U_Ext1SigmaErr_Pct = "User-defined minimum value for external (spot-to-spot) 1sigma uncertainty (expressed as a percentage), intended to supersede any smaller value of WtdAv_Xcor_206Pb238U_CalibConst[2] (i.e. the third element of the WtdAv vector output) calculated from any RM dataset of WtdAv_Xcor_206Pb238U_CalibConst values. In this context, Xcor denotes the index isotope used for the common Pb correction in the RM (i.e. 204Pb, 207Pb, or in the case of U-Pb Tasks that calculate only a single 206Pb/238U calibration (i.e. "Perm1"), possibly 208Pb)".

Minimum_208Pb232Th_Ext1SigmaErr_Pct = "User-defined minimum value for external (spot-to-spot) 1sigma uncertainty (expressed as a percentage), intended to supersede any smaller value of WtdAv_Xcor_208Pb232Th_CalibConst[2] (i.e. the third element of the WtdAv vector output) calculated from any RM dataset of WtdAv_Xcor_208Pb232Th_CalibConst values. In this context, Xcor denotes the index isotope used for the common Pb correction in the RM (i.e. 204Pb or 207Pb)".

If I have the relationships correct, this would mean that all Expression-Names of the form "Xcor_206Pb238U_Ext1SigmaErr_Pct" would be governed by an expression of the form:

= MAX( WtdAv_Xcor_206Pb238U_CalibConst[2], Minimum_206Pb238U_Ext1SigmaErr_Pct )

and all Expression-Names of the form "Xcor_208Pb232Th_Ext1SigmaErr_Pct" would be governed by an expression of the form:

= MAX( WtdAv_Xcor_208Pb232Th_CalibConst[2], Minimum_208Pb232Th_Ext1SigmaErr_Pct )

A drawback of this second option is that it is an enhancement relative to SQUID 2.50, and it is possible I have not considered all of the complexity that might arise. Only an examination of the Squid3 Java code could inform that. In addition, I am not sure how important the availability of Daughter-Parent specific values would ever be, and it might well be better to address this later, if it becomes an issue.

Opinions welcome!

cwmagee commented 5 years ago

A a slight correction to Simon's post, while the mass defect of an isotope is the binding energy, isotopes with positive mass defects do NOT have negative binding energy. One atomic mass unit (amu) is defined as one twelfth the mass of carbon 12. So isotopes with positive mass defects are bound less tightly than carbon, while isotopes with negative mass defects are bound more tightly. To calculate the binding energy, subtract the mass of the isotope from the appropriate numbers of protons (mass 1.007 and neutrons (1.008). All naturally occurring isotopes have positive binding energy- otherwise the neutrons and protons won't stick together.

bowring commented 5 years ago

@sbodorkos - Both solutions are straightforward to build (we have the first one anyway). As I look at the second solution I have to ask why only two choices? - is there a case where each weighted mean would get a different value based, say, on the correction isotope?

sbodorkos commented 5 years ago

OK, I found the SQUID 2.50 bug that breaks dual-calibration Tasks (Perm2/Perm4) when using a Th-centric concentration material. I'm documenting the issue here because we have replicated this bug faithfully in Squid3, so we'll need to repair it. I refer to the PDF version of the Squid3 wiki (as created by Jim @bowring in July 2018: [SHRIMP_Wiki_Sq2.50_Pt2-6.pdf] https://github.com/CIRDLES/Squid/files/2905458/SHRIMP_Wiki_Sq2.50_Pt2-6.pdf). Here's a screenshot of the top of page 76:

image

The code inside the yellow box gives rise to four of our Unique Name expressions (for dual-calibration Tasks Perm2 and Perm4 only), which when expressed with full reference to the index isotope and the RM/Unknown flags, are:

4cor_Th_Concen_RM = 4cor_232Th238U_RM * U_Concen_RM * L0.9678
7cor_Th_Concen_RM = 7cor_232Th238U_RM * U_Concen_RM * L0.9678

4cor_Th_Concen = 4cor_232Th238U * U_Concen * L0.9678
7cor_Th_Concen = 7cor_232Th238U * U_Concen * L0.9678

The issue with the yellow block is that it explicitly assumes you already have U_Concen, and that you want Th_Concen; it does not consider the possibility that you might want to rearrange the above equations because you have already determined Th_Concen, and U_Concen is the remaining unknown.

If you're wondering why this is only a Perm2/Perm4 issue, and does not affect single-calibration Tasks (Perm1/Perm3), it's because Ludwig dealt properly with the possibility of Th-centric concentration materials in the single-calibration case. The relevant code is documented on page 26 (and the few preceding) in the wiki PDF.

So our aim is to rearrange (algebraically) the four equations above, to define the four Xcor_U_Concen... expressions Jim is lacking. When expressed with full reference to the index isotope and the RM/Unknown flags, these are:

4cor_U_Concen_RM = Th_Concen_RM / 4cor_232Th238U_RM / L0.9678
7cor_U_Concen_RM = Th_Concen_RM / 7cor_232Th238U_RM / L0.9678

4cor_U_Concen = Th_Concen / 4cor_232Th238U / L0.9678
7cor_U_Concen = Th_Concen / 7cor_232Th238U / L0.9678

To do this, we just need to introduce an If statement analogous to that used on page 26 for the Perm1/Perm3 case. It would look like:

If pbUconcStd = TRUE --i.e. concentration material is U-centric, as per usual

  {Yellow block of code exactly as Ludwig wrote it}

ElseIf pbThconcStd = TRUE --i.e. concentration material is Th-centric
--(I have used ElseIf to guard against "No U or Th concentration std" cases,
--but it shouldn't be necessary, as this If is nested within an If that demands
--proper definition of "ppmU" and "ppmTh" columns in advance.)

  Term1 = ["ppmTh"] / ["232Th/238U"] / 0.9678
  --"magic number" 0.9678 documented elsewhere

  --Now use Term1 to populate column ["ppmU"] for Perm2/Perm4:
  PlaceFormulae Term1, Frw, {ppmU-column}, Lrw

End If --pbUconcStd = TRUE

So, I have implemented a bug-fix in SQUID 2.50 to this effect (it was bloody difficult; Ludwig uses white spaces as column-name delimiters, so the syntax is hard to read and write, because of course you can't use white spaces for any other purpose).

The good news is that it works! I was able to "invert" an analysis of M127 to derive a U value of 923 ppm, by using a "reference" Th value of 418 ppm (having originally obtained that Th value by running a "normal" U-centric data reduction on the same XML file beforehand).

And it works for both RMs and unknowns (both were failing before). I hadn't picked up the problem in the Unknowns because there were still numbers in the columns; they were just wrong numbers as a consequence of circular references in Excel, essentially stemming from the above bug.

So in terms of expressions, that's probably everything we need for the next release...

sbodorkos commented 5 years ago

@bowring with regard to ExtPerr, it is essentially a fudge factor anyway, it just needs to live in the Preferences because different users/labs have different recipes. It shouldn't be index-isotope dependent; in fact, it's not even clear that there is a physical basis for genuine differences in ExtPerr between the 206Pb/238U and 208Pb/232Th systems.

I am inclined to suggest we build the capability to be able to specify different ExtPerr values for 206Pb/238U and 208Pb/232Th (as per my option 2 in previous post), but at the same time set it up so the Pb/U and Pb/Th values are always the same (as per SQUID 2.50) until the user manually decouples them. I think that approach would cover all the bases.

bowring commented 5 years ago

@sbodorkos - As I implement this solution, I have questions / comments: 1) We currently have neither Xcor_ UConcen or Xcor ThConcen. 2) Should I add both sets or just Xcor U_Concen ? 3) In any case, we will need "notes" for the new expressions. 4) When I introduced the new definitions for 4cor_U_Concen_RM and 7cor_U_Concen_RM only the reports for 4-corr perm2RM and 4-corr perm4RM changed - does this seem correct?

Cheers

sbodorkos commented 5 years ago

@bowring Yes, sorry, I tried to do too much of it in my head, forgetting the relationships between (a) the "comprehensive" set of RM expressions (i.e. before the Perm-number, index isotope, and parent element are chosen), (b) the "aliased" set of RM expressions (i.e. after those three selections have been made), and (c) the sole set of Unknown expressions (which are predicated on the aliased RM set, not the comprehensive RM set). Just anpother reason to get on with sorting out the corresponding three Reports, after the next release! As numbered:

  1. Yes, my bad, I assumed that the unknowns mirrored the behaviour of the RMs without checking the documentation. The difference is that Unknown calculations are not done until the aliased RM set is fully resolved, and that resolution process means that U_Concen and Th_Concen always become uniquely defined for the Unknown at that point. This is the essence of why the R- and RU-flagged expressions are giving us so much more grief than the expressions solely flagged U.

  2. No, add neither set for the unknowns. The guiding principle is that all the "Th_Concen" expressions (across RM and Unknowns, and with index-isotope dependence for the RM in Perm2 and Perm4) were already correctly configured. All we need to do is mirror the overall expression-name configuration for the various "U_Concen" expressions. In practice, as we now know, that means simply defining expressions for 4cor_U_Concen_RM and 7cor_U_Concen_RM. As you have noted, there is no need for extra definitions of UConcen for the unknowns, because the range of potential variation in those values is completely captured and controlled by the range of potential variation in the "feeder" {Xcor}U_Concen_RM expressions. See the (correctly specified) relationships between the single U-flagged expression ThConcen and the gamut of R-flagged {Xcor}Th_Concen_RM expressions for an example of what I mean by this.

  3. No longer applicable, as per the above. I rechecked "ExpressionNames_AcrossPerms_v5_NR+SB.xlsx", and I can confirm that the expression-names and Definition Notes there are correct and complete; nothing further is needed. Had I looked at this spreadsheet before trying to document the bug, I would have done a much better job of it.

  4. Correct. None of this latest palaver has any effect at all on Perm1 or Perm3. It caters purely to the (very unusual) combination of Perm-number = 2 or 4 AND Concentration Material = Th-centric. And that would be why the bug persisted undiscovered in SQUID 2.50 for a decade: it's very likely that no-one has ever tried this combination of parameters in a real scientific application.

bowring commented 5 years ago

@sbodorkos - All changes completed and tested. Release imminent! Per request here are the expression names listed by perm.

perm4_BuiltinExpressions.txt perm3_BuiltinExpressions.txt perm2_BuiltinExpressions.txt perm1_BuiltinExpressions.txt

bowring commented 5 years ago

Finally put to bed with release of Squid3 v1.1.0 !