Layering of WtdAv–WeightedAv–WeightedAverage

CIRDLES / Squid

Squid3 is being developed by the Cyber Infrastructure Research and Development Lab for the Earth Sciences (CIRDLES.org) at the College of Charleston, Charleston, SC and Geoscience Australia as a re-implementation in Java of Ken Ludwig's Squid 2.5. - please contribute your expertise!

http://cirdles.org/projects/squid/

Apache License 2.0

12 stars 27 forks source link

Layering of WtdAv–WeightedAv–WeightedAverage #582

Open sbodorkos opened 3 years ago

sbodorkos commented 3 years ago

Hi Jim @bowring , I have finished looking through the WeightedAverage (and overlaid) code, but it is complicated because there are lots of ways into it, both from Squid (e.g. using WtdMeanAcalc for calibration constants, or sqWtdAv in user-defined expressions) and in Excel, from Isoplot (using the array function in a spreadsheet, or the toolbar-based function).

In addition to these various entry-points, we know that the array output of the calculation is conditional (with respect to both the numerical values reported and the labels assigned to them). Some of the conditions are dictatable by the user at the outset of the calculation, but the function-layering renders these relationships opaque. I am going to try to tease it out here (in instalments, and skipping over as much of the Excel-specific guff as I can). Note that in Excel, pretty much all of the below is part of the Isoplot codebase, not SQUID 2.50.

As a preface, function sqWtdAv is defined as:

sqWtdAv ( Values, Errs, PercentErrsIn, PercentErrsOut, CanReject )

where:

Values is a mandatory 1-column range of numeric values.
Errs is a mandatory 1-column range of numeric uncertainties associated with the Values (and of the same length as Values).
PercentErrsIn is an optional Boolean, permitting the user to specify whether the Errs values are percentage (TRUE) or absolute (FALSE). If not specified, it defaults to FALSE.
PercentErrsOut is an optional Boolean, permitting the user to specify whether the uncertainties output by the weighted mean calculation (error of the mean, with or without "external" error) should be specified as percentage (TRUE) or absolute (FALSE). If not specified, it defaults to FALSE.
CanReject is an optional Boolean, permitting the user to specify whether Isoplot should be permitted to try excluding one or more outliers from the submitted set of Values and Errs, if the initial weighted mean calculation yields a probability-of-fit < 0.05. (Isoplot has its own rules for this process: no more than 15% of the total population can ever be rejected, and although rejection tolerance commences at 2sigma, the tolerance loosens with each successive pass. More on this later.) If not specified, it defaults to FALSE.

sqWtdAv is unique in that it requires Values and Errs to be specified as separate vectors, and the sqWtdAv code then explicitly defines ValuesErrs as the unified 2-column array. This is to bring the input into line with all other invocations of the "starter" Isoplot function WtdAv, which I will describe next.

sbodorkos commented 3 years ago

The starting point for Isoplot (and therefore SQUID 2.50) in terms of the generalised calculation of a weighted mean is via the (public) function WtdAv (which means you can invoke it in your Excel spreadsheet as an array function, if you know how... Ludwig wrote a little "toolbar-only function" to help users do this).

Note that Squid3 does not start at this point when calculating a weighted mean, which is why I am documenting this. Plus, the function has 6 optional arguments, but Ludwig has only publicly documented the first five... and (unsurprisingly) it turns out that this is where some of our confusion arises!

Function WtdAv is defined as

WtdAv ( ValuesAndErrors, PercentOut, PercentIn, SigmaLevelIn, CanReject, ConstantExternalErr, SigmaLevelOut )

where:

ValuesAndErrors is a mandatory 2-column range, which has Values in the left-hand column, and Errors on the right. The columns are prescribed, but the rows are not. Ranges need not even be contiguous blocks of rows in Excel, so there can be skipped rows, non-numeric rows, null rows, etc.)
PercentOut is an optional Boolean, permitting the user to specify whether the uncertainties output by the weighted mean calculation (error of the mean, with or without "external" error) should be specified as percentage (TRUE) or absolute (FALSE). If not specified, it defaults to FALSE.
PercentIn is an optional Boolean, permitting the user to specify whether the Errs values are percentage (TRUE) or absolute (FALSE). If not specified, it defaults to FALSE.
SigmaLevelIn is an optional numeric value, permitting the user to specify whether the input Errors values are 1sigma (1) or 2sigma (2). If not specified, it defaults to 2.
CanReject is an optional Boolean, permitting the user to specify whether Isoplot should be permitted to try excluding one or more outliers from the submitted set of Values and Errs, if the initial weighted mean calculation yields a probability-of-fit < 0.05. If not specified, it defaults to FALSE.
ConstantExternalErr is an optional Boolean, permitting the user to specify whether Isoplot should be permitted to try calculating the magnitude of the "external" error (i.e. an additional component of uncertainty which, when added in quadrature to each of the input Errors, yields a population with MSWD ~ 1), if the initial weighted mean calculation yields a probability-of-fit < 0.05. If not specified, it defaults to FALSE.
SigmaLevelOut is an optional numeric value, permitting the user to specify whether uncertainties output by the weighted mean calculation (error of the mean, with or without "external" error) should be 1sigma (1) or 2sigma (2). If not specified, it defaults to 2.

This function actually doesn't do very much, except ensure that the range ValuesAndErrors has exactly 2 columns (and at least 2 rows), plus formalise the value of each of the optional arguments, before passing them to subroutine WeightedAv, which does a bit more of the preparatory work.

sbodorkos commented 3 years ago

Subroutine WeightedAv is invoked solely within WtdAv, so it is not really necessary to define the arguments beyond that call of the subroutine, and I will include the assigned values in the description.

In Excel, WeightedAv has one option Boolean argument (AltRej) which deals with legacy functionality in the Excel environment (i.e. it allows an alternative method of user-defined rejections in environments where the usual method [application of the Strikethrough font] is unavailable). I have omitted AltRej from the subroutine description because it is not relevant.

Subroutine WeightedAv is defined as

WeightedAv ( W, ValuesErrs, PercentOut, PercentIn, SigmaLevelIn, CanRej, ConstExtErr, SigmaLevelOut )

where

W is the array output of the weighted mean calculation, comprising 2 columns (left = values, right = captions) and up to 7 rows, depending on CanRej and ConstExtErr. Note that both the values and the captions are CONDITIONAL, based on the subroutine arguments. In an attempt to reduce some of the needless complexity in the output, I will (eventually) document it as always having 7 rows, with NULLs in rows that prove inapplicable based on the user-defined CanRej and ConstExtErr values.
ValuesErrs is a mandatory 2-column range, which has Values in the left-hand column, and Errors on the right. The parent function WtdAv sets WeightedAv's ValuesErrs = WtdAv's ValuesAndErrors.
PercentOut is an optional Boolean, permitting the user to specify whether the uncertainties output by the weighted mean calculation (error of the mean, with or without "external" error) should be specified as percentage (TRUE) or absolute (FALSE). The parent function WtdAv sets WeightedAv's PercentOut = WtdAv's PercentOut. If not specified (obsolete because WtdAv always specifies), it defaults to FALSE.
PercentIn is an optional Boolean, permitting the user to specify whether the Errs values are percentage (TRUE) or absolute (FALSE). The parent function WtdAv sets WeightedAv's PercentIn = WtdAv's PercentIn. If not specified (obsolete because WtdAv always specifies), it defaults to FALSE.
SigmaLevelIn is an optional numeric value, permitting the user to specify whether the input Errors values are 1sigma (1) or 2sigma (2). The parent function WtdAv sets WeightedAv's SigmaLevelIn = WtdAv's SigmaLevelIn. If not specified (obsolete because WtdAv always specifies), it defaults to 2.
CanRej is an optional Boolean, permitting the user to specify whether Isoplot should be permitted to try excluding one or more outliers from the submitted set of Values and Errs, if the initial weighted mean calculation yields a probability-of-fit < 0.05. The parent function WtdAv sets WeightedAv's CanRej = WtdAv's CanReject. If not specified (obsolete because WtdAv always specifies), it defaults to FALSE.
ConstExtErr is an optional Boolean, permitting the user to specify whether Isoplot should be permitted to try calculating the magnitude of the "external" error (i.e. an additional component of uncertainty which, when added in quadrature to each of the input Errors, yields a population with MSWD ~ 1), if the initial weighted mean calculation yields a probability-of-fit < 0.05. The parent function WtdAv sets WeightedAv's CanRej = WtdAv's ConstantExternalErr. If not specified (obsolete because WtdAv always specifies), it defaults to FALSE.
SigmaLevelOut is an optional numeric value, permitting the user to specify whether uncertainties output by the weighted mean calculation (error of the mean, with or without "external" error) should be 1sigma (1) or 2sigma (2). The parent function WtdAv sets WeightedAv's SigmaLevelOut = WtdAv's SigmaLevelOut. If not specified (obsolete because WtdAv always specifies), it defaults to 2.

This subroutine does do a bit of work, especially on ValuesErrs. For the specified 2-column range, each row is assessed. If the Value is numeric and the Err is numeric, then the next check ensures that the Value is non-zero and the Err is non-zero and the row has not been independently identified by the user as a manual rejection (e.g. Strikethrough font in Excel; check-box in Squid3). Note also that if input errors were specified as percentage, they are converted to absolute values at this stage.

If the row passes all these tests, it is added to the 2-column array InpDat, which holds only the "clean" ValuesErrs data (with absolute errors), which will be used for the real weighted-mean arithmetic. That is handled by subroutine WeightedAverage... which I will describe next.

sbodorkos commented 3 years ago

Subroutine WeightedAverage is invoked within WeightedAv, but also within WtdAvPlot-related functions, so I will define the arguments beyond that call of the subroutine, and I will include the assigned values in the description. But this is where the arithmetic actually happens.

Subroutine WeightedAverage is defined as

WeightedAverage ( Npts, ww, Nrej, Wrejected, CanTukeys, OneSigOut, IntErr68, Optional ExtErr68 = 0 )

where

Npts is the number of analyses in the clean InpDat array (defined by subroutine WeightedAv).
ww is the array output of the weighted mean calculation.
Nrej is the number of analyses in the clean InpDat array which Isoplot identified as rejections, if CanReject (defined by subroutine WeightedAv) is TRUE. Note that Nrej does not include previously identified rejections (e.g. user-defined rejections via Strikethrough or check-box, or 'large error' rejections identified via WtdMeanAcalc. These 'previous' rejections should already have been excluded from the InpDat array, prior to invocation of subroutine WeightedAverage.
Wrejected is a 2-column double-precision array of identical size to InpDat, which holds the (indexed) Values and Errors of analyses from the clean InpDat array which Isoplot identified as rejections, if CanReject (defined by subroutine WeightedAv) is TRUE.
CanTukeys is a mandatory Boolean, controlling whether WeightedAverage should calculate a Tukey's biweight mean and its (95% confidence interval) and a median value with its asymmetric ~95% confidence intervals (following Rock et al., 1987), in addition to the weighted mean which the primary focus of this subroutine. Under program control, CanTukeys is usually FALSE, but in Excel, it is set to TRUE when users invoked Weighted Average via Isoplot's dialog box.
OneSigOut is an optional Boolean, permitting the user to specify whether uncertainties output by the weighted mean calculation (error of the mean, with or without "external" error) should be 1sigma (TRUE) or not (FALSE). In most applications, it ends up being set to FALSE, but the invocation from WtdMeanAcalc sets it to TRUE, and this is important. If not specified, it defaults to FALSE.
IntErr68 is an optional double-precision value as output from WeightedAverage, corresponding to the 68% confidence interval on the "internal" error of the mean. It is only calculated if OneSigOut = TRUE. If not calculated, it defaults to 0.
ExtErr68 is an optional double-precision value as output from WeightedAverage, corresponding to the 68% confidence interval on the "external" error of the mean. It is only calculated if the MSWD of the weighted mean exceeds 1, and the probability of equivalence of the weighted mean is below a user-specified minimum (usually 0.05), and OneSigOut = TRUE. If not calculated, it defaults to 0.

In subroutine WeightedAv, the subroutine WeightedAverage is invoked as

WeightedAverage ( N, ww, Nrej, Wrejected(), False, (SigmaLevelOut = 1), IntErr68, ExtErr68 )

which means OneSigOut = TRUE when SigmaLevelOut = 1, otherwise OneSigOut = FALSE.

The next post will document the WeightedAverage code.

sbodorkos commented 3 years ago

Boolean value: NoEvar

Integer values: Count, i, j, N0, Nn, nU

Double-precision values: ExtSigma, ExtSigmaMean, ExtVar, ExtXbarSigma, IntMean, IntSigmaMean, IntVar, MSWD, PointError, Probability, q, Resid, Sums, SumWtdRatios, temp, t68, t95, Tolerance, Tot1sig, TotalError, TotVar, Trial, Weight, WtdAvg, WtdAvgErr, WtdR2, WtdResid

Double-precision arrays: InverseVar, IvarY, tbx, yy

String vector (4 elements): r

Nn = Npts
Count = 0
Nrej = 0

If MinProb = 0, Isoplot sets it using the value stored in Isoplot's Numeric Preferences (statistical). In practice, MinProb should be 0.05 unless otherwise indicated.

If MinProb = 0

    MinProb = Val( Menus["MinProb"] ) --MinProb should be 0.05

End If

Some vector/array-sizing, and calculation of the inverse variance values, to weight the weighted mean:

ReDim InverseVar[Nn], yy[Nn], IvarY[Nn], tbx[Nn], yf.WtdResid[Nn]

If CanReject = TRUE

    ReDim Wrejected[Nn, 2]

End If

For i = 1 To Nn

    InverseVar[i] = 1 / ( InpDat[i, 2] )^2

Next i

The following is the "re-entry point" for recalculation of a weighted mean, when CanReject = TRUE and Isoplot has identified an eligible rejection:

Recalc:

ExtSigma = 0
ww.Ext2Sigma = 0 --Named element of output array ww
Weight = 0
SumWtdRatios = 0
q = 0
Count = Count + 1

For i = 1 To Npts

    If InpDat[i, 1] <> 0 AND InpDat[i, 2] <> 0

        Weight = Weight + InverseVar[i]
        SumWtdRatios = SumWtdRatios + InverseVar[i] * InpDat[i, 1]
        q = q + InverseVar[i] * ( ( InpDat[i, 1] )^2 )

    End If

Next i

Now some fundamental calculations (re-sequenced from Ludwig for clarity, and reader sanity), plus calculation of Student's t at the 68.26% confidence level and 95% confidence level (these values are a function of population size). The Excel function TINV returns the two-tailed inverse of the Student's t-distribution, and is documented at https://support.microsoft.com/en-us/office/tinv-function-a7c85b9d-90f5-41fe-9ca5-1cd2f3e1ed7c :

IntMean = SumWtdRatios / Weight --"Internal" wtd average
IntSigmaMean = sqrt( 1 / Weight ) --"Internal" 1sigma absolute error of wtd average
nU = Nn - 1 --Degrees of freedom
t68 = TINV( 0.3174, nU ) --Because 1 - (68.26%) = 0.3174; this is a magic number
t95 = TINV( 0.05, nU ) --Because 1 - (95%) = 0.05

Sums = 0

For i = 1 To Npts

    If InpDat[i, 1] <> 0 AND InpDat[i, 2] <> 0

        Resid = InpDat[i, 1] - IntMean --Simple residual
        WtdResid = Resid / InpDat[i, 2] --Wtd residual
        WtdR2 = WtdResid^2 --Square of wtd residual

        If Nn = Npts 

            yf.WtdResid[i] = WtdResid --WtdR2

        End If --Nn = Npts

        Sums = Sums + WtdR2

    End If --InpDat[i, 1] <> 0 AND InpDat[i, 2] <> 0

Next i

Sums can be used to calculate MSWD and Probability, in conjunction with Excel's FDIST function, which returns the (right-tailed) F probability distribution (degree of diversity) for two data sets. This is documented at https://support.microsoft.com/en-us/office/fdist-function-ecf76fba-b3f1-4e7d-a57e-6a5b7460b786 :

Sums = MAX( Sums, 0 )
MSWD = Sums / nU --Mean Square of Weighted Deviates
Probability = FDIST( MSWD, nU, 1e+9 ) --Probability of equivalence

Provisionally populate some of the elements of output array ww:

With ww
    .IntMean = IntMean
    .IntMeanErr2sigma = 2 * IntSigmaMean
    .MSWD = MSWD
    .Probability = Probability

    If Probability >= 0.3

        .IntMeanErr95 = IntSigmaMean * 1.96

        If OneSigOut = TRUE

            IntErr68 = IntSigmaMean * 0.9998

        End If --OneSigOut = TRUE

    Else

        .IntMeanErr95 = IntSigmaMean * t95 * sqrt( MSWD )

        If OneSigOut = TRUE

            IntErr68 = IntSigmaMean * t68 * sqrt( MSWD )

        End If --OneSigOut = TRUE

    End If --Probability >= 0.3

End With

At this point, WeightedAverage has completed population of the ww.Int... elements, as per this view of the VBA6 code editor/debugger in Excel 2003:

WW_AfterIntMean

The next step is to determine whether the data are dispersed (relative to a specific threshold), and if so, to calculate the constant "external" variance (calculated by Ludwig via maximum-likelihood estimation), and the associated "external" weighted mean and its error. Note that this process is independent of the value of the user-specified Boolean ConstExtErr (from subroutine WeightedAv). That code (which follows immediately on) commences:

If ( Probability < MinProb ) AND ( MSWD > 1 )

    Nn = 0

    For i = 1 To Npts

        If InpDat[i, 1] <> 0

            Nn = 1 + Nn
            yy[Nn] = InpDat[i, 1]
            IvarY[Nn] = ( InpDat[i, 2] )^2

        End If

    Next i

    ReDim Preserve yy[Nn], IvarY[Nn]

Vectors yy and IvarY are the inputs to the subroutine that calculates the "external" variance, via the secant method, described by Press et al. (1987), p. 250–251 (more detail below). I am going to step out of this unfinished If statement to describe the generalised function WtdExtFunc, which is an important part of that process, and then return.

sbodorkos commented 3 years ago

Stepping out of the unfinished If statement above, I am going to describe the generalised function WtdExtFunc, which is used repeatedly in the iterative calculation of the "external" error.

Function WtdExtFunc is defined as

WtdExtFunc( TrialExtVar, Xbar, XbarSigma, yy, IvarY, Nn )

where

TrialExtVar is a (mandatory) existing double-precision estimate of "external variance", as input to the calculation performed by this function.

Xbar is a double-precision estimate of the "external" mean, based on the input TrialExtVar, yy and IvarY, as output from the calculation performed by this function.

XbarSigma is a double-precision estimate of the 1sigma error of the "external" mean, based on the input TrialExtVar and IvarY, as output from the calculation performed by this function.

yy is a double-precision vector of length Nn containing the Values, as input to the calculation performed by this function. Usage as per referring function WeightedAverage.

IvarYis a double-precision vector of length Nn, containing the variances, as input to the calculation performed by this function. Usage as per referring function WeightedAverage.

Nn is the (mandatory) integer number of values, as input to the calculation performed by this function. Usage as per referring function WeightedAverage.

Ludwig described this function as "WtdExtFunc will be zero when the external variance, TrialExtVar, is chosen correctly (Nn >= 3)". This is a bit cryptic, because TrialExtVar is definitely an input... I think what he means is that this function will generate an improved estimate for TrialExtVar, and the function can be used iteratively. It proceeds as follows:

Integer value: i

Double-precision values: ff, Resid, SumW, SumW2resid2, SumXW

Double-precision array: W

ReDim W[Nn] --W needs to be the same length as yy

For i = 1 To Nn

    W[i] = 1 / ( IvarY[i] + TrialExtVar )
    SumW = SumW + W[i]
    SumXW = SumXW + yy[i] * W[i]

Next i

Xbar = SumXW / SumW
XbarSigma = SQRT( ABS( 1 / SumW ) ) --1sigma error in Xbar

For i = 1 To Nn

    Resid = yy[i] - Xbar
    SumW2resid2 = SumW2resid2 + ( W[i] * Resid )^2

Next i

ff = SumW2resid2 - SumW
WtdExtFunc = ff

End Function

sbodorkos commented 3 years ago

Returning to the unfinished If statement above, the code continues:

    ReDim Preserve yy[Nn], IvarY[Nn]

    WtdExtRTSEC 0, 10 * (IntSigmaMean^2), ExtVar, ExtMean, ExtXbarSigma, yy, IvarY, Nn, NoEvar

Ludwig described subroutine WtdExtRTSEC as finding the root (thought to lie between 0 and MaxExtVar) of a function WtdExtFunc, using the secant method. The root, returned as WtdExtRTSEC, is refined until its accuracy is +/-xacc. He said the method is described in Numerical Recipes in C: The Art of Scientific Computing (Press et al., 1987, p. 250-251). Subroutine WtdExtRTSEC is invoked exclusively within WeightedAverage, so I have defined almost all of the arguments in terms of WeightedAverage usage.

Subroutine WtdExtRTSEC is defined as

WtdExtRTSEC( 0, MaxExtVar, ExtVar, ExtMean, ExtXbarSigma, yy, IvarY, Nn, NoEvar )

where

0 is a (mandatory) pre-existing minimum estimate of ExtVar, as input to this subroutine.

MaxExtVar is a (mandatory) pre-existing maximum estimate (double-precision) of ExtVar, as input to this subroutine. Referring subroutine WeightedAverage sets MaxExtVar = 10 * ( IntSigmaMean^2 )

ExtVar is the best estimate (double-precision) of "external variance", as output from this subroutine if NoEvar = FALSE.

ExtMean is the calculated (double-precision) "external" mean (from iterated WtdExtFunc, as Xbar), as output from this subroutine if NoEvar = FALSE.

ExtXbarSigma is the calculated (double-precision) 1sigma error of the "external" mean (from interated WtdExtFunc, as XbarSigma), as output from this subroutine if NoEvar = FALSE.

yy is a double-precision vector of length Nn containing the Values, as input to this subroutine. Usage as per referring subroutine WeightedAverage.

IvarYis a double-precision vector of length Nn containing the variances, as input to this subroutine. Usage as per referring subroutine WeightedAverage.

Nn is the integer number of values, as input to this subroutine. Usage as per referring subroutine WeightedAverage.

NoEvar is a Boolean indicating whether the secant-method calculation in this subroutine has succeeded (FALSE) or failed (TRUE). It is explicitly initialised as FALSE.

The subroutine proceeds as follows:

Boolean value: NoEvar

Integer values: j, rct

Double-precision values: dx, f, facc, FL, Lastf1, Lastf2, MaxExtVar, rts, SwapTemp, tmp, xacc, xl

xacc = 1e-9
facc = 1e-7
NoEvar = FALSE
MaxExtVar = 10 * ( IntSigmaMean^2 ) --from referring subroutine WeightedAverage

FL = WtdExtFunc( 0, Xbar, XbarSigma, yy, IvarY, Nn )
f = WtdExtFunc( MaxExtVar, Xbar, XbarSigma, yy, IvarY, Nn )

Ludwig starts by checking that FL and f are not too close in value. Curiously, he does this by converting each double-precision value to single-precision (via Excel VBA function CSng) for comparison, and aborting the subroutine if the single-precision values are equivalent:

If CSng( f ) = CSng( FL ) 

    GoTo FailedExit

End If

Then some ring-a-rosy aimed at ensuring that the bound with the smaller WtdExtFunc value is chosen as the initial guess for subsequent iterative calculations in the secant loop:

If ABS( FL ) < ABS( f )

    rts = 0
    xl = MaxExtVar
    SwapTemp = FL
    FL = f
    f = SwapTemp

Else

    xl = 0
    rts = MaxExtVar

End If

Now the secant loop, with a maximum of 99 iterations:

For j = 1 To 99

    If f = FL --Success!

        ExtVar = rts
        Exit Sub

    End If

    dx = ( xl - rts ) * f / ( f - FL ) --Increment with respect to latest value
    xl = rts
    FL = f
    rct = 0

    Do
        tmp = rts + dx

        If tmp >= 0

            Exit Do

        ElseIf rct > 100

            GoTo FailedExit

        End If

        dx = dx / 2
        rct = 1 + rct

        If rct > 99

            GoTo FailedExit

        End If
    Loop

    rts = tmp
    f = WtdExtFunc( rts, Xbar, XbarSigma, yy, IvarY, Nn )

    If ABS( f ) < facc OR ABS( f ) = ABS( Lastf2 ) --Success, if f < facc, OR f is oscillating

        ExtVar = rts
        Exit Sub

    End If

    Lastf2 = Lastf1
    Lastf1 = f

Next j

FailedExit: 
NoEvar = TRUE
End Sub

The next post will return (again) to the unfinished If statement above, and assign "external error" values to the elements of array ww in subroutine WeightedAverage.

sbodorkos commented 3 years ago

Returning to the unfinished If statement above, the code in subroutine WeightedAverage continues by evaluating the success or otherwise of subroutine WtdExtRTSEC, and assigns "external error" values to the elements of array ww on that basis. Three scenarios are considered by the code:

Arithmetical success of WtdExtRTSEC. This is the dominant mode of operation; I do not recall ever being aware of a failure in it. In this case, WtdExtRTSEC outputs ExtMean, ExtXbarSigma and ExtVar are used to calculate the relevant elements of ww.
Failure of WtdExtRTSEC accompanied by a "very high" value of MSWD (> 4). I suspect this scenario is obsolete (I am sure WtdExtRTSEC works even when MSWD > 100), but I don't know for certain, so I have retained it for completeness. In this case, the WtdExtRTSEC output is ignored, and the relevant elements of ww are populated using AVERAGE and STDEV, applied directly to the WtdExtRTSEC inputs.
Failure of WtdExtRTSEC not accompanied by a "very high" value of MSWD (> 4). This is the catch-all scenario, and in this case, all the relevant elements of ww are zeroed.

    WtdExtRTSEC 0, 10 * (IntSigmaMean^2), ExtVar, ExtMean, ExtXbarSigma, yy, IvarY, Nn, NoEvar

    With ww

        If NoEvar = FALSE --then subroutine WtdExtRTSEC succeeded

            .ExtMean = ExtMean
            ExtSigma = SQRT( ExtVar )
            .ExtMeanErr95 = TINV( 0.05, (2 * nU) ) * ExtXbarSigma

            If OneSigOut = TRUE

                ExtErr68 = TINV( 0.3174, (2 * nU) ) * ExtXbarSigma

            End If

        ElseIf MSWD > 4 --then subroutine WtdExtRTSEC might have failed due to very high MSWD

            .ExtMean = AVERAGE( yy ) --simple Excel function
            ExtSigma = STDEV( yy ) --simple Excel function
            .ExtMeanErr95 = t95 * ExtSigma / SQRT( Nn )

            If OneSigOut = TRUE

                ExtErr68 = t68 * ExtSigma

            End If

        Else --total failure

            .ExtMean = 0
            ExtSigma = 0
            .ExtMeanErr95 = 0

            If OneSigOut = TRUE

                ExtErr68 = 0

            End If

        End If --NoEvar = FALSE

        .Ext2Sigma = 2 * ExtSigma
        .ExtMeanErr68 = t68 / t95 * .ExtMeanErr95

    End With

End If -- ( Probability < MinProb ) AND ( MSWD > 1 )

remembering that this If statement opened in the final code-block from this post: https://github.com/CIRDLES/Squid/issues/582#issuecomment-784738500. At this point, WeightedAverage has finished populating its ww.Ext... elements, like this:

WW_AfterExtMean_RTSEC

sbodorkos commented 3 years ago

Subroutine WeightedAverage continues by considering its own automated point-rejection routine, which is invoked if user-specified ( CanReject = TRUE ) AND ( Probability < MinProb ):

If ( CanReject = TRUE ) AND ( Probability < MinProb ) 

    GoSub Reject --see below

End If

This is followed (for user-specified CanTukeys = TRUE) by evaluation of the three ww.BiWt... and three ww.Median... elements. Those functions are quite trivial, but I am going to document them in a separate post, because the code looks very old (and untidy), and I think I can see a bug, unfortunately.

If WeightedAverage's auto-rejection routine is bypassed (either by the user or by the statistical coherence of the population), that concludes the Weighted Average subroutine.

If CanTukeys = TRUE 

    --Calculate the 3 ww.BiWt... values and the 3 ww.Median... values:
    --Documented in a separate post, yet to come

End If

Exit Sub

All that is left is the description of the auto-rejection routine. But the start of this code is difficult, and I include a screenshot of Ludwig's VBA verbatim:

My first problem with this screenshot is the evaluation of "If ww.Ext2sigma", given that ww.Ext2sigma is a double-precision number. I have not yet found any documentation specific to VBA6, but for C++, "If Double" = FALSE when Double is zero or null, else TRUE. For the moment, I am assuming the convention is the same in VBA6.

In this context, it is relevant that ww.Ext2sigma is explicitly set to 0 at the beginning of every iteration of the subroutine WeightedAverage, irrespective of whether it is the initial run-through, or a subsequent iteration demanded by Recalc:, and also irrespective of whether or not MSWD and Probability demand an evaluation of the "external error". Non-zero values of ww.Ext2sigma occur in only two circumstances:

WtdExtRTSEC output NoEvar = FALSE
( WtdExtRTSEC output NoEvar = TRUE ) AND ( MSWD > 4 )

With all of this in mind, I have re-expressed "If ww.Ext2sigma" as "If ww.Ext2sigma <> 0" because I think that is the explicit meaning.

Reject: --Destination of GoSub above

If ww.Ext2Sigma <> 0

    WtdAvg = ww.ExtMean
    WtdAvgErr = ww.ExtMeanErr95

Else

    WtdAvg = ww.IntMean
    WtdAvgErr = ww.IntMeanErr95

End If

My second problem with the screenshot is the evaluation of WtdAvgErr in the scenario where OneSigOut = TRUE. Ludwig's code is unambiguous: WtdAvgErr = ExtErr68, irrespective of the evaluation of "If ww.Ext2sigma". But is that correct? If ww.Ext2sigma = 0, then the various evaluation-paths indicate that ExtErr68 is either 0 or not evaluated. Is WtdAvgErr = ExtErr68 really appropriate in those cases? The (temporary) saving grace is that the assigned value of WtdAvgErr is not used anywhere in subroutine WeightedAverage. But I have a feeling it is called in the parent subroutine WeightedAv, so we will just have to wait and see whether this is correct, or an inconsequential error, or (perhaps) an important error.

My rendition of this code continues:

If OneSigOut = TRUE

    WtdAvgErr = ExtErr68

End If

Now the outlier rejection testing routine commences. Overall, no more than 15% of analyses can be rejected from the original population, and the rejection tolerance increases with each pass through the data, starting at 2sigma when attempting the first rejection, increasing to 2.5sigma when attempting a second rejection, 3sigma when attempting a third, and so on.

N0 = Nn --Start outlier rejection routine

For i = 1 To Npts

    If ( InpDat[i, 1] <> 0 ) AND ( Nn > 0.85 * Npts )
        --Reject no more than 15% of points. Start rejection tolerance at 2sigma, and increase slightly on each pass.

        PointError = 2 * SQRT( ( InpDat[i, 2] )^2 + ExtSigma^2 ) --2sigma error of point being tested
        TotalError = SQRT( PointError^2 + ( 2 * ww.ExtMeanErr68 )^2 )
        Tolerance = ( 1 + ( Count - 1 ) / 4 ) * TotalError
        --First-pass tolerance is 2sigma; second-pass is 2.5sigma; third-pass is 3sigma...

        q = InpDat[i, 1] - WtdAvg

        If ( ABS( q ) > Tolerance ) AND ( Nn > 2 )

            Nrej = 1 + Nrej --increment number of rejections
            Nn = Nn - 1 --decrease population size

            --Write value and error of rejected point to Wrejected, then remove it from InpDat
            For j = 1 To 2

                Wrejected[i, j] = InpDat[i, j]
                InpDat[i, j] = 0

            Next j

        End If --( ABS( q ) > Tolerance ) AND ( Nn > 2 )

    End If --( InpDat[i, 1] <> 0 ) AND ( Nn > 0.85 * Npts )

Next i

If Nn < N0

    GoTo Recalc --this destination is right back near the beginning of subroutine WeightedAverage

End If

Return
End Sub

This is the end of subroutine WeightedAverage. The next stage is to step back up to the parent subroutine WeightedAv, to understand how the elements of WeightedAverage output array ww are translated into the WeightedAv output array W. I believe this is the piece of the puzzle currently missing, in terms of rigorously mapping the Squid3-based output of WeightedAverage to the Isoplot3 output of WtdAv.

cwmagee commented 3 years ago

So taking a wtd avg for unknown samples with rejection enabled can give us different answers depending on how the spots are sorted?

sbodorkos commented 3 years ago

@cwmagee based on my reading of the Ludwig code, yes, I believe so. Needs to be tested, should be straightforward, but I don't have a dataset to hand.

I suggest 9 x OG1 207/206 spot-dates (containing exactly 1 outlier at the 2-3sigma level), and a 10th spot being TEMORA. Using the y-bar spreadsheet function with CanReject = TRUE ought to give discernibly different results depending on whether TEMORA spot is at the top of the list or the bottom (because only 1 rejection is ever allowed when n = 10).

cwmagee commented 3 years ago

I think the MtPainter data requires 1 rejection, but it can be either high or low. So I'll see if the mean changes based on order. I'll poke around tomorrow.

sbodorkos commented 3 years ago

Stepping out of WeightedAverage back up to WeightedAv, recall that the latter invokes the former as:

WeightedAverage ( N, ww, Nrej, Wrejected(), False, (SigmaLevelOut = 1), IntErr68, ExtErr68 )

which means OneSigOut = TRUE when SigmaLevelOut = 1, otherwise OneSigOut = FALSE.

The parent subroutine WeightedAv takes the form:

WeightedAv ( W, ValuesErrs, PercentOut, PercentIn, SigmaLevelIn, CanRej, ConstExtErr, SigmaLevelOut )

where W is the array output of the weighted mean calculation, comprising 2 columns (left = values, right = captions) and up to 7 rows, depending on CanRej and ConstExtErr. Note that both the values and the captions are CONDITIONAL, based on the subroutine arguments.

The following sticks quite closely to how Ludwig maps the elements of WeightedAverage entity ww to the WeightedAv output array W, because it's quite complicated, and I don't want to alter the meaning of the source VBA code. But I have removed as many of the text-manipulations as I can, and replaced as many of the interim variable-assignments as possible, replacing them with explicit If statements. I think it is complete and correct, but if there are things here that don't make sense, let me know!

With ww

    If ConstExtErr = FALSE

        W[1, 1] = ww.IntMean
        W[1, 2] = "Wtd Mean (from internal errs)"

    Else

        W[3, 1] = 0 --interim assignment; persists only if ww.Probability > 0.05

        If SigmaLevelOut = 1

            W[3, 2] = "Required external 1-sigma"

        Else

            W[3, 2] = "Required external 2-sigma"

        End If --SigmaLevelOut = 1

    End If --ConstExtErr = FALSE

    If ww.Probability > 0.05 --Note that Ludwig hard-codes this numeric value; there is no reference to MinProb!

        W[2, 1] = ww.IntMeanErr2sigma / 2.0 * SigmaLevelOut

        If SigmaLevelOut = 1

            W[2, 2] = "1-sigma err. of mean"

        Else

            W[2, 2] = "2-sigma err. of mean"

        End If --SigmaLevelOut = 1

    ElseIf ConstExtErr = TRUE

        W[1, 1] = ww.ExtMean
        W[1, 2] = "Wtd Mean (using external err)"

        If SigmaLevelOut = 1

            W[2, 1] = ww.ExtMeanErr68
            W[2, 2] = "68%-conf. err. of mean"

        Else

            W[2, 1] = ww.ExtMeanErr95
            W[2, 2] = "95%-conf. err. of mean"

        End If --SigmaLevelOut = 1

        W[3, 1] = ww.Ext2Sigma / 2.0 * SigmaLevelOut

    Else --i.e. If (ww.Probability <= 0.05) AND (ConstExtErr = FALSE)

        If SigmaLevelOut = 1

            W[2, 1] = ww.IntErr68 --Note: NOT IntMeanErr68!
            W[2, 2] = "68%-conf. err. of mean"

        Else

            W[2, 1] = ww.IntMeanErr95
            W[2, 2] = "95%-conf. err. of mean"

        End If --SigmaLevelOut = 1

    End If --ww.Probability > 0.05

End With

If PercentOut = TRUE

    W[2, 1] = ABS( W[2, 1] / W[1, 1] * 100.0 )
    W[2, 2] = W[2, 2] & " (%)" --appends " (%)" to pre-existing caption

    If ConstExtErr = TRUE

        W[3, 1] = ABS( W[3, 1] / W[1, 1] * 100.0 )
        W[3, 2] = W[3, 2] & " (%)" --appends " (%)" to pre-existing caption

    End If --ConstExtErr = TRUE

End If --PercentOut = TRUE

If ConstExtErr = TRUE

    W[4, 1] = ww.MSWD
    W[4, 2] = "MSWD"
    W[5, 1] = Nrej
    W[5, 2] = "rejected"
    W[6, 1] = ww.Probability
    W[6, 2] = "Probability of fit"

Else

    W[3, 1] = ww.MSWD
    W[3, 2] = "MSWD"
    W[4, 1] = Nrej
    W[4, 2] = "rejected"
    W[5, 1] = ww.Probability
    W[5, 2] = "Probability of fit"

End If --ConstExtErr = TRUE

If CanReject = TRUE

    rj$ = "" --Defines empty string

    If ConstExtErr = TRUE --Define the row of W in which the list of rejections will appear

        j = 7

    Else

        j = 6

    End If --ConstExtErr = TRUE

    If Nrej > 0 --Create comma-separated string containing index #s of the rejected points

        For i = 1 To N

            If ISEMPTY( Wrejected[i, 1] ) = FALSE --subroutine WeightewdAverage created Wrejected

                rj$ = rj$ & ", " & STR( i ) --STR converts i to string

            End If --ISEMPTY( Wrejected[i, 1] ) = FALSE

        Next i

        W[j, 1] = rj$
        W[j, 2] = "rej. item #(s)"

    Else

        W[j, 1] = ""
        W[j, 2] = ""

    End If --Nrej > 0

End If --CanReject = TRUE

End Sub

sbodorkos commented 3 years ago

Recall that the top-level invocation of the Isoplot3 WtdAv function is:

WtdAv ( ValuesAndErrors, PercentOut, PercentIn, SigmaLevelIn, CanReject, ConstantExternalErr, SigmaLevelOut )

so there are up to six arguments to be specified, beyond ValuesAndErrors.

1. In the context of initial (automated) WtdMeanAcalc, the VBA invocation verbatim is:

wW = Isoplot3.wtdav(Arange, True, True, 1, Not NoReject, True, 1)

so PercentOut = PercentIn = TRUE, SigmaLevelIn = SigmaLevelOut = 1, ConstantExternalErr = TRUE.

I think NoReject (and its opposite, CanReject) are specified by the user.

2. In the context of REDO of a WtdMeanAcalc, the VBA invocation verbatim is:

wW = Isoplot3.wtdav(InpR, True, True, 1, False, True, 1)

The difference here (relative to automated initial execution of WtdMeanAcalc) is that CanReject = FALSE, because the user has had the opportunity to manually reject and un-reject points from the population, so it is important that the Redo of the calculation does not independently perform the automatic (albeit deficient) rejection routine enabled by CanReject = TRUE.

3. In the context of manual/redo weighted mean AGE calculations for Samples (not Reference Materials), as executed by SQUID 2.50 is subroutine ExtractGroup, the VBA invocation verbatim is:

wW = Isoplot3.wtdav(InpR, False, False, 1, False, False, 2)

so PercentOut = PercentIn = FALSE (because the age calculations), SigmaLevelIn = 1 (because that's the levelk at which the uncertainties of the constituent analyses are held), SigmaLevelOut = 2 (because mean ages should be quoted at 2sigma/95% confidence as far as possible), and ConstantExternalErr = FALSE (because statistical dispersion in an unknown population is usually assumed to be geologically significant, rather than a consequence of underestimated errors).

4. In the context of the SQUID-specific function sqWtdAv, which itself is invoked as:

sqWtdAv ( Values, Errs, PercentErrsIn, PercentErrsOut, CanReject )

Isoplot's WtdAv function is invoked via:

Set ValuesErrs = Union(Values, Errs)
w = Isoplot3.wtdav(ValuesErrs, PercentErrsIn, PercentErrsOut, 1, CanReject, True, 1)

This is a bit unusual because the SigmaLevelIn and SigmaLevelOut are both hard-coded to 1, which is different to everything else except WtdMeanAcalc (the latter being a completely different scenario where that treatment is justified).

Those are all the invocations I could easily find in SQUID 2.50 and Isoplot... if something obvious is missing, please let me know.

sbodorkos commented 3 years ago

Jim @bowring , the long code-block in the post from 18 March is probably the key one, in terms of "translating" values calculated by the WtdAv arithmetic into values portrayed by Ludwig as output, especially in the context of subroutine WtdMeanAcalc, and you might not need to look further. The next post after that has the settings for all the Booleans and SigmaLevel variables accompanying the invocation of WtdAv by WtdMeanAcalc.

sbodorkos commented 3 years ago

Postscript regarding CanReject: there is NO dependency on sort order (that I could find) when determining which spot(s) should be rejected:

Clearly I did not eyeball the code correctly in my previous posts, and I have gone back to edit that commentary, to avoid re-stumbling across it later.