Open sbodorkos opened 3 years ago
The starting point for Isoplot (and therefore SQUID 2.50) in terms of the generalised calculation of a weighted mean is via the (public) function WtdAv (which means you can invoke it in your Excel spreadsheet as an array function, if you know how... Ludwig wrote a little "toolbar-only function" to help users do this).
Note that Squid3 does not start at this point when calculating a weighted mean, which is why I am documenting this. Plus, the function has 6 optional arguments, but Ludwig has only publicly documented the first five... and (unsurprisingly) it turns out that this is where some of our confusion arises!
Function WtdAv is defined as
WtdAv ( ValuesAndErrors, PercentOut, PercentIn, SigmaLevelIn, CanReject, ConstantExternalErr, SigmaLevelOut )
where:
ValuesAndErrors
is a mandatory 2-column range, which has Values in the left-hand column, and Errors on the right. The columns are prescribed, but the rows are not. Ranges need not even be contiguous blocks of rows in Excel, so there can be skipped rows, non-numeric rows, null rows, etc.)
PercentOut
is an optional Boolean, permitting the user to specify whether the uncertainties output by the weighted mean calculation (error of the mean, with or without "external" error) should be specified as percentage (TRUE) or absolute (FALSE). If not specified, it defaults to FALSE.
PercentIn
is an optional Boolean, permitting the user to specify whether the Errs values are percentage (TRUE) or absolute (FALSE). If not specified, it defaults to FALSE.
SigmaLevelIn
is an optional numeric value, permitting the user to specify whether the input Errors values are 1sigma (1) or 2sigma (2). If not specified, it defaults to 2.
CanReject
is an optional Boolean, permitting the user to specify whether Isoplot should be permitted to try excluding one or more outliers from the submitted set of Values and Errs, if the initial weighted mean calculation yields a probability-of-fit < 0.05. If not specified, it defaults to FALSE.
ConstantExternalErr
is an optional Boolean, permitting the user to specify whether Isoplot should be permitted to try calculating the magnitude of the "external" error (i.e. an additional component of uncertainty which, when added in quadrature to each of the input Errors, yields a population with MSWD ~ 1), if the initial weighted mean calculation yields a probability-of-fit < 0.05. If not specified, it defaults to FALSE.
SigmaLevelOut
is an optional numeric value, permitting the user to specify whether uncertainties output by the weighted mean calculation (error of the mean, with or without "external" error) should be 1sigma (1) or 2sigma (2). If not specified, it defaults to 2.
This function actually doesn't do very much, except ensure that the range ValuesAndErrors has exactly 2 columns (and at least 2 rows), plus formalise the value of each of the optional arguments, before passing them to subroutine WeightedAv, which does a bit more of the preparatory work.
Subroutine WeightedAv is invoked solely within WtdAv, so it is not really necessary to define the arguments beyond that call of the subroutine, and I will include the assigned values in the description.
In Excel, WeightedAv has one option Boolean argument (AltRej
) which deals with legacy functionality in the Excel environment (i.e. it allows an alternative method of user-defined rejections in environments where the usual method [application of the Strikethrough font] is unavailable). I have omitted AltRej from the subroutine description because it is not relevant.
Subroutine WeightedAv is defined as
WeightedAv ( W, ValuesErrs, PercentOut, PercentIn, SigmaLevelIn, CanRej, ConstExtErr, SigmaLevelOut )
where
W
is the array output of the weighted mean calculation, comprising 2 columns (left = values, right = captions) and up to 7 rows, depending on CanRej and ConstExtErr. Note that both the values and the captions are CONDITIONAL, based on the subroutine arguments. In an attempt to reduce some of the needless complexity in the output, I will (eventually) document it as always having 7 rows, with NULLs in rows that prove inapplicable based on the user-defined CanRej and ConstExtErr values.
ValuesErrs
is a mandatory 2-column range, which has Values in the left-hand column, and Errors on the right. The parent function WtdAv sets WeightedAv's ValuesErrs = WtdAv's ValuesAndErrors.
PercentOut
is an optional Boolean, permitting the user to specify whether the uncertainties output by the weighted mean calculation (error of the mean, with or without "external" error) should be specified as percentage (TRUE) or absolute (FALSE). The parent function WtdAv sets WeightedAv's PercentOut = WtdAv's PercentOut. If not specified (obsolete because WtdAv always specifies), it defaults to FALSE.
PercentIn
is an optional Boolean, permitting the user to specify whether the Errs values are percentage (TRUE) or absolute (FALSE). The parent function WtdAv sets WeightedAv's PercentIn = WtdAv's PercentIn. If not specified (obsolete because WtdAv always specifies), it defaults to FALSE.
SigmaLevelIn
is an optional numeric value, permitting the user to specify whether the input Errors values are 1sigma (1) or 2sigma (2). The parent function WtdAv sets WeightedAv's SigmaLevelIn = WtdAv's SigmaLevelIn. If not specified (obsolete because WtdAv always specifies), it defaults to 2.
CanRej
is an optional Boolean, permitting the user to specify whether Isoplot should be permitted to try excluding one or more outliers from the submitted set of Values and Errs, if the initial weighted mean calculation yields a probability-of-fit < 0.05. The parent function WtdAv sets WeightedAv's CanRej = WtdAv's CanReject. If not specified (obsolete because WtdAv always specifies), it defaults to FALSE.
ConstExtErr
is an optional Boolean, permitting the user to specify whether Isoplot should be permitted to try calculating the magnitude of the "external" error (i.e. an additional component of uncertainty which, when added in quadrature to each of the input Errors, yields a population with MSWD ~ 1), if the initial weighted mean calculation yields a probability-of-fit < 0.05. The parent function WtdAv sets WeightedAv's CanRej = WtdAv's ConstantExternalErr. If not specified (obsolete because WtdAv always specifies), it defaults to FALSE.
SigmaLevelOut
is an optional numeric value, permitting the user to specify whether uncertainties output by the weighted mean calculation (error of the mean, with or without "external" error) should be 1sigma (1) or 2sigma (2). The parent function WtdAv sets WeightedAv's SigmaLevelOut = WtdAv's SigmaLevelOut. If not specified (obsolete because WtdAv always specifies), it defaults to 2.
This subroutine does do a bit of work, especially on ValuesErrs. For the specified 2-column range, each row is assessed. If the Value is numeric and the Err is numeric, then the next check ensures that the Value is non-zero and the Err is non-zero and the row has not been independently identified by the user as a manual rejection (e.g. Strikethrough font in Excel; check-box in Squid3). Note also that if input errors were specified as percentage, they are converted to absolute values at this stage.
If the row passes all these tests, it is added to the 2-column array InpDat, which holds only the "clean" ValuesErrs data (with absolute errors), which will be used for the real weighted-mean arithmetic. That is handled by subroutine WeightedAverage... which I will describe next.
Subroutine WeightedAverage is invoked within WeightedAv, but also within WtdAvPlot-related functions, so I will define the arguments beyond that call of the subroutine, and I will include the assigned values in the description. But this is where the arithmetic actually happens.
Subroutine WeightedAverage is defined as
WeightedAverage ( Npts, ww, Nrej, Wrejected, CanTukeys, OneSigOut, IntErr68, Optional ExtErr68 = 0 )
where
Npts
is the number of analyses in the clean InpDat array (defined by subroutine WeightedAv).
ww
is the array output of the weighted mean calculation.
Nrej
is the number of analyses in the clean InpDat array which Isoplot identified as rejections, if CanReject (defined by subroutine WeightedAv) is TRUE. Note that Nrej does not include previously identified rejections (e.g. user-defined rejections via Strikethrough or check-box, or 'large error' rejections identified via WtdMeanAcalc. These 'previous' rejections should already have been excluded from the InpDat array, prior to invocation of subroutine WeightedAverage.
Wrejected
is a 2-column double-precision array of identical size to InpDat, which holds the (indexed) Values and Errors of analyses from the clean InpDat array which Isoplot identified as rejections, if CanReject (defined by subroutine WeightedAv) is TRUE.
CanTukeys
is a mandatory Boolean, controlling whether WeightedAverage should calculate a Tukey's biweight mean and its (95% confidence interval) and a median value with its asymmetric ~95% confidence intervals (following Rock et al., 1987), in addition to the weighted mean which the primary focus of this subroutine. Under program control, CanTukeys is usually FALSE, but in Excel, it is set to TRUE when users invoked Weighted Average via Isoplot's dialog box.
OneSigOut
is an optional Boolean, permitting the user to specify whether uncertainties output by the weighted mean calculation (error of the mean, with or without "external" error) should be 1sigma (TRUE) or not (FALSE). In most applications, it ends up being set to FALSE, but the invocation from WtdMeanAcalc sets it to TRUE, and this is important. If not specified, it defaults to FALSE.
IntErr68
is an optional double-precision value as output from WeightedAverage, corresponding to the 68% confidence interval on the "internal" error of the mean. It is only calculated if OneSigOut = TRUE. If not calculated, it defaults to 0.
ExtErr68
is an optional double-precision value as output from WeightedAverage, corresponding to the 68% confidence interval on the "external" error of the mean. It is only calculated if the MSWD of the weighted mean exceeds 1, and the probability of equivalence of the weighted mean is below a user-specified minimum (usually 0.05), and OneSigOut = TRUE. If not calculated, it defaults to 0.
In subroutine WeightedAv, the subroutine WeightedAverage is invoked as
WeightedAverage ( N, ww, Nrej, Wrejected(), False, (SigmaLevelOut = 1), IntErr68, ExtErr68 )
which means OneSigOut = TRUE when SigmaLevelOut = 1, otherwise OneSigOut = FALSE.
The next post will document the WeightedAverage code.
Boolean value: NoEvar
Integer values: Count, i, j, N0, Nn, nU
Double-precision values: ExtSigma, ExtSigmaMean, ExtVar, ExtXbarSigma, IntMean, IntSigmaMean, IntVar, MSWD, PointError, Probability, q, Resid, Sums, SumWtdRatios, temp, t68, t95, Tolerance, Tot1sig, TotalError, TotVar, Trial, Weight, WtdAvg, WtdAvgErr, WtdR2, WtdResid
Double-precision arrays: InverseVar, IvarY, tbx, yy
String vector (4 elements): r
Nn = Npts
Count = 0
Nrej = 0
If MinProb = 0, Isoplot sets it using the value stored in Isoplot's Numeric Preferences (statistical). In practice, MinProb should be 0.05 unless otherwise indicated.
If MinProb = 0
MinProb = Val( Menus["MinProb"] ) --MinProb should be 0.05
End If
Some vector/array-sizing, and calculation of the inverse variance values, to weight the weighted mean:
ReDim InverseVar[Nn], yy[Nn], IvarY[Nn], tbx[Nn], yf.WtdResid[Nn]
If CanReject = TRUE
ReDim Wrejected[Nn, 2]
End If
For i = 1 To Nn
InverseVar[i] = 1 / ( InpDat[i, 2] )^2
Next i
The following is the "re-entry point" for recalculation of a weighted mean, when CanReject = TRUE and Isoplot has identified an eligible rejection:
Recalc:
ExtSigma = 0
ww.Ext2Sigma = 0 --Named element of output array ww
Weight = 0
SumWtdRatios = 0
q = 0
Count = Count + 1
For i = 1 To Npts
If InpDat[i, 1] <> 0 AND InpDat[i, 2] <> 0
Weight = Weight + InverseVar[i]
SumWtdRatios = SumWtdRatios + InverseVar[i] * InpDat[i, 1]
q = q + InverseVar[i] * ( ( InpDat[i, 1] )^2 )
End If
Next i
Now some fundamental calculations (re-sequenced from Ludwig for clarity, and reader sanity), plus calculation of Student's t at the 68.26% confidence level and 95% confidence level (these values are a function of population size). The Excel function TINV returns the two-tailed inverse of the Student's t-distribution, and is documented at https://support.microsoft.com/en-us/office/tinv-function-a7c85b9d-90f5-41fe-9ca5-1cd2f3e1ed7c :
IntMean = SumWtdRatios / Weight --"Internal" wtd average
IntSigmaMean = sqrt( 1 / Weight ) --"Internal" 1sigma absolute error of wtd average
nU = Nn - 1 --Degrees of freedom
t68 = TINV( 0.3174, nU ) --Because 1 - (68.26%) = 0.3174; this is a magic number
t95 = TINV( 0.05, nU ) --Because 1 - (95%) = 0.05
Sums = 0
For i = 1 To Npts
If InpDat[i, 1] <> 0 AND InpDat[i, 2] <> 0
Resid = InpDat[i, 1] - IntMean --Simple residual
WtdResid = Resid / InpDat[i, 2] --Wtd residual
WtdR2 = WtdResid^2 --Square of wtd residual
If Nn = Npts
yf.WtdResid[i] = WtdResid --WtdR2
End If --Nn = Npts
Sums = Sums + WtdR2
End If --InpDat[i, 1] <> 0 AND InpDat[i, 2] <> 0
Next i
Sums can be used to calculate MSWD and Probability, in conjunction with Excel's FDIST function, which returns the (right-tailed) F probability distribution (degree of diversity) for two data sets. This is documented at https://support.microsoft.com/en-us/office/fdist-function-ecf76fba-b3f1-4e7d-a57e-6a5b7460b786 :
Sums = MAX( Sums, 0 )
MSWD = Sums / nU --Mean Square of Weighted Deviates
Probability = FDIST( MSWD, nU, 1e+9 ) --Probability of equivalence
Provisionally populate some of the elements of output array ww:
With ww
.IntMean = IntMean
.IntMeanErr2sigma = 2 * IntSigmaMean
.MSWD = MSWD
.Probability = Probability
If Probability >= 0.3
.IntMeanErr95 = IntSigmaMean * 1.96
If OneSigOut = TRUE
IntErr68 = IntSigmaMean * 0.9998
End If --OneSigOut = TRUE
Else
.IntMeanErr95 = IntSigmaMean * t95 * sqrt( MSWD )
If OneSigOut = TRUE
IntErr68 = IntSigmaMean * t68 * sqrt( MSWD )
End If --OneSigOut = TRUE
End If --Probability >= 0.3
End With
At this point, WeightedAverage has completed population of the ww.Int... elements, as per this view of the VBA6 code editor/debugger in Excel 2003:
The next step is to determine whether the data are dispersed (relative to a specific threshold), and if so, to calculate the constant "external" variance (calculated by Ludwig via maximum-likelihood estimation), and the associated "external" weighted mean and its error. Note that this process is independent of the value of the user-specified Boolean ConstExtErr (from subroutine WeightedAv). That code (which follows immediately on) commences:
If ( Probability < MinProb ) AND ( MSWD > 1 )
Nn = 0
For i = 1 To Npts
If InpDat[i, 1] <> 0
Nn = 1 + Nn
yy[Nn] = InpDat[i, 1]
IvarY[Nn] = ( InpDat[i, 2] )^2
End If
Next i
ReDim Preserve yy[Nn], IvarY[Nn]
Vectors yy and IvarY are the inputs to the subroutine that calculates the "external" variance, via the secant method, described by Press et al. (1987), p. 250–251 (more detail below). I am going to step out of this unfinished If statement to describe the generalised function WtdExtFunc, which is an important part of that process, and then return.
Stepping out of the unfinished If statement above, I am going to describe the generalised function WtdExtFunc, which is used repeatedly in the iterative calculation of the "external" error.
Function WtdExtFunc is defined as
WtdExtFunc( TrialExtVar, Xbar, XbarSigma, yy, IvarY, Nn )
where
TrialExtVar
is a (mandatory) existing double-precision estimate of "external variance", as input to the calculation performed by this function.
Xbar
is a double-precision estimate of the "external" mean, based on the input TrialExtVar, yy and IvarY, as output from the calculation performed by this function.
XbarSigma
is a double-precision estimate of the 1sigma error of the "external" mean, based on the input TrialExtVar and IvarY, as output from the calculation performed by this function.
yy
is a double-precision vector of length Nn containing the Values, as input to the calculation performed by this function. Usage as per referring function WeightedAverage.
IvarY
is a double-precision vector of length Nn, containing the variances, as input to the calculation performed by this function. Usage as per referring function WeightedAverage.
Nn
is the (mandatory) integer number of values, as input to the calculation performed by this function. Usage as per referring function WeightedAverage.
Ludwig described this function as "WtdExtFunc will be zero when the external variance, TrialExtVar, is chosen correctly (Nn >= 3)". This is a bit cryptic, because TrialExtVar is definitely an input... I think what he means is that this function will generate an improved estimate for TrialExtVar, and the function can be used iteratively. It proceeds as follows:
Integer value: i
Double-precision values: ff, Resid, SumW, SumW2resid2, SumXW
Double-precision array: W
ReDim W[Nn] --W needs to be the same length as yy
For i = 1 To Nn
W[i] = 1 / ( IvarY[i] + TrialExtVar )
SumW = SumW + W[i]
SumXW = SumXW + yy[i] * W[i]
Next i
Xbar = SumXW / SumW
XbarSigma = SQRT( ABS( 1 / SumW ) ) --1sigma error in Xbar
For i = 1 To Nn
Resid = yy[i] - Xbar
SumW2resid2 = SumW2resid2 + ( W[i] * Resid )^2
Next i
ff = SumW2resid2 - SumW
WtdExtFunc = ff
End Function
Returning to the unfinished If statement above, the code continues:
ReDim Preserve yy[Nn], IvarY[Nn]
WtdExtRTSEC 0, 10 * (IntSigmaMean^2), ExtVar, ExtMean, ExtXbarSigma, yy, IvarY, Nn, NoEvar
Ludwig described subroutine WtdExtRTSEC as finding the root (thought to lie between 0 and MaxExtVar) of a function WtdExtFunc, using the secant method. The root, returned as WtdExtRTSEC, is refined until its accuracy is +/-xacc. He said the method is described in Numerical Recipes in C: The Art of Scientific Computing (Press et al., 1987, p. 250-251). Subroutine WtdExtRTSEC is invoked exclusively within WeightedAverage, so I have defined almost all of the arguments in terms of WeightedAverage usage.
Subroutine WtdExtRTSEC is defined as
WtdExtRTSEC( 0, MaxExtVar, ExtVar, ExtMean, ExtXbarSigma, yy, IvarY, Nn, NoEvar )
where
0
is a (mandatory) pre-existing minimum estimate of ExtVar, as input to this subroutine.
MaxExtVar
is a (mandatory) pre-existing maximum estimate (double-precision) of ExtVar, as input to this subroutine. Referring subroutine WeightedAverage sets MaxExtVar = 10 * ( IntSigmaMean^2 )
ExtVar
is the best estimate (double-precision) of "external variance", as output from this subroutine if NoEvar = FALSE.
ExtMean
is the calculated (double-precision) "external" mean (from iterated WtdExtFunc, as Xbar), as output from this subroutine if NoEvar = FALSE.
ExtXbarSigma
is the calculated (double-precision) 1sigma error of the "external" mean (from interated WtdExtFunc, as XbarSigma), as output from this subroutine if NoEvar = FALSE.
yy
is a double-precision vector of length Nn containing the Values, as input to this subroutine. Usage as per referring subroutine WeightedAverage.
IvarY
is a double-precision vector of length Nn containing the variances, as input to this subroutine. Usage as per referring subroutine WeightedAverage.
Nn
is the integer number of values, as input to this subroutine. Usage as per referring subroutine WeightedAverage.
NoEvar
is a Boolean indicating whether the secant-method calculation in this subroutine has succeeded (FALSE) or failed (TRUE). It is explicitly initialised as FALSE.
The subroutine proceeds as follows:
Boolean value: NoEvar
Integer values: j, rct
Double-precision values: dx, f, facc, FL, Lastf1, Lastf2, MaxExtVar, rts, SwapTemp, tmp, xacc, xl
xacc = 1e-9
facc = 1e-7
NoEvar = FALSE
MaxExtVar = 10 * ( IntSigmaMean^2 ) --from referring subroutine WeightedAverage
FL = WtdExtFunc( 0, Xbar, XbarSigma, yy, IvarY, Nn )
f = WtdExtFunc( MaxExtVar, Xbar, XbarSigma, yy, IvarY, Nn )
Ludwig starts by checking that FL and f are not too close in value. Curiously, he does this by converting each double-precision value to single-precision (via Excel VBA function CSng) for comparison, and aborting the subroutine if the single-precision values are equivalent:
If CSng( f ) = CSng( FL )
GoTo FailedExit
End If
Then some ring-a-rosy aimed at ensuring that the bound with the smaller WtdExtFunc value is chosen as the initial guess for subsequent iterative calculations in the secant loop:
If ABS( FL ) < ABS( f )
rts = 0
xl = MaxExtVar
SwapTemp = FL
FL = f
f = SwapTemp
Else
xl = 0
rts = MaxExtVar
End If
Now the secant loop, with a maximum of 99 iterations:
For j = 1 To 99
If f = FL --Success!
ExtVar = rts
Exit Sub
End If
dx = ( xl - rts ) * f / ( f - FL ) --Increment with respect to latest value
xl = rts
FL = f
rct = 0
Do
tmp = rts + dx
If tmp >= 0
Exit Do
ElseIf rct > 100
GoTo FailedExit
End If
dx = dx / 2
rct = 1 + rct
If rct > 99
GoTo FailedExit
End If
Loop
rts = tmp
f = WtdExtFunc( rts, Xbar, XbarSigma, yy, IvarY, Nn )
If ABS( f ) < facc OR ABS( f ) = ABS( Lastf2 ) --Success, if f < facc, OR f is oscillating
ExtVar = rts
Exit Sub
End If
Lastf2 = Lastf1
Lastf1 = f
Next j
FailedExit:
NoEvar = TRUE
End Sub
The next post will return (again) to the unfinished If statement above, and assign "external error" values to the elements of array ww in subroutine WeightedAverage.
Returning to the unfinished If statement above, the code in subroutine WeightedAverage continues by evaluating the success or otherwise of subroutine WtdExtRTSEC, and assigns "external error" values to the elements of array ww on that basis. Three scenarios are considered by the code:
Arithmetical success of WtdExtRTSEC. This is the dominant mode of operation; I do not recall ever being aware of a failure in it. In this case, WtdExtRTSEC outputs ExtMean, ExtXbarSigma and ExtVar are used to calculate the relevant elements of ww.
Failure of WtdExtRTSEC accompanied by a "very high" value of MSWD (> 4). I suspect this scenario is obsolete (I am sure WtdExtRTSEC works even when MSWD > 100), but I don't know for certain, so I have retained it for completeness. In this case, the WtdExtRTSEC output is ignored, and the relevant elements of ww are populated using AVERAGE and STDEV, applied directly to the WtdExtRTSEC inputs.
Failure of WtdExtRTSEC not accompanied by a "very high" value of MSWD (> 4). This is the catch-all scenario, and in this case, all the relevant elements of ww are zeroed.
WtdExtRTSEC 0, 10 * (IntSigmaMean^2), ExtVar, ExtMean, ExtXbarSigma, yy, IvarY, Nn, NoEvar
With ww
If NoEvar = FALSE --then subroutine WtdExtRTSEC succeeded
.ExtMean = ExtMean
ExtSigma = SQRT( ExtVar )
.ExtMeanErr95 = TINV( 0.05, (2 * nU) ) * ExtXbarSigma
If OneSigOut = TRUE
ExtErr68 = TINV( 0.3174, (2 * nU) ) * ExtXbarSigma
End If
ElseIf MSWD > 4 --then subroutine WtdExtRTSEC might have failed due to very high MSWD
.ExtMean = AVERAGE( yy ) --simple Excel function
ExtSigma = STDEV( yy ) --simple Excel function
.ExtMeanErr95 = t95 * ExtSigma / SQRT( Nn )
If OneSigOut = TRUE
ExtErr68 = t68 * ExtSigma
End If
Else --total failure
.ExtMean = 0
ExtSigma = 0
.ExtMeanErr95 = 0
If OneSigOut = TRUE
ExtErr68 = 0
End If
End If --NoEvar = FALSE
.Ext2Sigma = 2 * ExtSigma
.ExtMeanErr68 = t68 / t95 * .ExtMeanErr95
End With
End If -- ( Probability < MinProb ) AND ( MSWD > 1 )
remembering that this If statement opened in the final code-block from this post: https://github.com/CIRDLES/Squid/issues/582#issuecomment-784738500. At this point, WeightedAverage has finished populating its ww.Ext... elements, like this:
Subroutine WeightedAverage continues by considering its own automated point-rejection routine, which is invoked if user-specified ( CanReject = TRUE ) AND ( Probability < MinProb ):
If ( CanReject = TRUE ) AND ( Probability < MinProb )
GoSub Reject --see below
End If
This is followed (for user-specified CanTukeys = TRUE) by evaluation of the three ww.BiWt... and three ww.Median... elements. Those functions are quite trivial, but I am going to document them in a separate post, because the code looks very old (and untidy), and I think I can see a bug, unfortunately.
If WeightedAverage's auto-rejection routine is bypassed (either by the user or by the statistical coherence of the population), that concludes the Weighted Average subroutine.
If CanTukeys = TRUE
--Calculate the 3 ww.BiWt... values and the 3 ww.Median... values:
--Documented in a separate post, yet to come
End If
Exit Sub
All that is left is the description of the auto-rejection routine. But the start of this code is difficult, and I include a screenshot of Ludwig's VBA verbatim:
My first problem with this screenshot is the evaluation of "If ww.Ext2sigma", given that ww.Ext2sigma is a double-precision number. I have not yet found any documentation specific to VBA6, but for C++, "If Double" = FALSE when Double is zero or null, else TRUE. For the moment, I am assuming the convention is the same in VBA6.
In this context, it is relevant that ww.Ext2sigma is explicitly set to 0 at the beginning of every iteration of the subroutine WeightedAverage, irrespective of whether it is the initial run-through, or a subsequent iteration demanded by Recalc:, and also irrespective of whether or not MSWD and Probability demand an evaluation of the "external error". Non-zero values of ww.Ext2sigma occur in only two circumstances:
With all of this in mind, I have re-expressed "If ww.Ext2sigma" as "If ww.Ext2sigma <> 0" because I think that is the explicit meaning.
Reject: --Destination of GoSub above
If ww.Ext2Sigma <> 0
WtdAvg = ww.ExtMean
WtdAvgErr = ww.ExtMeanErr95
Else
WtdAvg = ww.IntMean
WtdAvgErr = ww.IntMeanErr95
End If
My second problem with the screenshot is the evaluation of WtdAvgErr in the scenario where OneSigOut = TRUE. Ludwig's code is unambiguous: WtdAvgErr = ExtErr68, irrespective of the evaluation of "If ww.Ext2sigma". But is that correct? If ww.Ext2sigma = 0, then the various evaluation-paths indicate that ExtErr68 is either 0 or not evaluated. Is WtdAvgErr = ExtErr68 really appropriate in those cases? The (temporary) saving grace is that the assigned value of WtdAvgErr is not used anywhere in subroutine WeightedAverage. But I have a feeling it is called in the parent subroutine WeightedAv, so we will just have to wait and see whether this is correct, or an inconsequential error, or (perhaps) an important error.
My rendition of this code continues:
If OneSigOut = TRUE
WtdAvgErr = ExtErr68
End If
Now the outlier rejection testing routine commences. Overall, no more than 15% of analyses can be rejected from the original population, and the rejection tolerance increases with each pass through the data, starting at 2sigma when attempting the first rejection, increasing to 2.5sigma when attempting a second rejection, 3sigma when attempting a third, and so on.
N0 = Nn --Start outlier rejection routine
For i = 1 To Npts
If ( InpDat[i, 1] <> 0 ) AND ( Nn > 0.85 * Npts )
--Reject no more than 15% of points. Start rejection tolerance at 2sigma, and increase slightly on each pass.
PointError = 2 * SQRT( ( InpDat[i, 2] )^2 + ExtSigma^2 ) --2sigma error of point being tested
TotalError = SQRT( PointError^2 + ( 2 * ww.ExtMeanErr68 )^2 )
Tolerance = ( 1 + ( Count - 1 ) / 4 ) * TotalError
--First-pass tolerance is 2sigma; second-pass is 2.5sigma; third-pass is 3sigma...
q = InpDat[i, 1] - WtdAvg
If ( ABS( q ) > Tolerance ) AND ( Nn > 2 )
Nrej = 1 + Nrej --increment number of rejections
Nn = Nn - 1 --decrease population size
--Write value and error of rejected point to Wrejected, then remove it from InpDat
For j = 1 To 2
Wrejected[i, j] = InpDat[i, j]
InpDat[i, j] = 0
Next j
End If --( ABS( q ) > Tolerance ) AND ( Nn > 2 )
End If --( InpDat[i, 1] <> 0 ) AND ( Nn > 0.85 * Npts )
Next i
If Nn < N0
GoTo Recalc --this destination is right back near the beginning of subroutine WeightedAverage
End If
Return
End Sub
This is the end of subroutine WeightedAverage. The next stage is to step back up to the parent subroutine WeightedAv, to understand how the elements of WeightedAverage output array ww are translated into the WeightedAv output array W. I believe this is the piece of the puzzle currently missing, in terms of rigorously mapping the Squid3-based output of WeightedAverage to the Isoplot3 output of WtdAv.
So taking a wtd avg for unknown samples with rejection enabled can give us different answers depending on how the spots are sorted?
@cwmagee based on my reading of the Ludwig code, yes, I believe so. Needs to be tested, should be straightforward, but I don't have a dataset to hand.
I suggest 9 x OG1 207/206 spot-dates (containing exactly 1 outlier at the 2-3sigma level), and a 10th spot being TEMORA. Using the y-bar spreadsheet function with CanReject = TRUE ought to give discernibly different results depending on whether TEMORA spot is at the top of the list or the bottom (because only 1 rejection is ever allowed when n = 10).
I think the MtPainter data requires 1 rejection, but it can be either high or low. So I'll see if the mean changes based on order. I'll poke around tomorrow.
Stepping out of WeightedAverage back up to WeightedAv, recall that the latter invokes the former as:
WeightedAverage ( N, ww, Nrej, Wrejected(), False, (SigmaLevelOut = 1), IntErr68, ExtErr68 )
which means OneSigOut = TRUE when SigmaLevelOut = 1, otherwise OneSigOut = FALSE.
The parent subroutine WeightedAv takes the form:
WeightedAv ( W, ValuesErrs, PercentOut, PercentIn, SigmaLevelIn, CanRej, ConstExtErr, SigmaLevelOut )
where W
is the array output of the weighted mean calculation, comprising 2 columns (left = values, right = captions) and up to 7 rows, depending on CanRej and ConstExtErr. Note that both the values and the captions are CONDITIONAL, based on the subroutine arguments.
The following sticks quite closely to how Ludwig maps the elements of WeightedAverage entity ww
to the WeightedAv output array W
, because it's quite complicated, and I don't want to alter the meaning of the source VBA code. But I have removed as many of the text-manipulations as I can, and replaced as many of the interim variable-assignments as possible, replacing them with explicit If statements. I think it is complete and correct, but if there are things here that don't make sense, let me know!
With ww
If ConstExtErr = FALSE
W[1, 1] = ww.IntMean
W[1, 2] = "Wtd Mean (from internal errs)"
Else
W[3, 1] = 0 --interim assignment; persists only if ww.Probability > 0.05
If SigmaLevelOut = 1
W[3, 2] = "Required external 1-sigma"
Else
W[3, 2] = "Required external 2-sigma"
End If --SigmaLevelOut = 1
End If --ConstExtErr = FALSE
If ww.Probability > 0.05 --Note that Ludwig hard-codes this numeric value; there is no reference to MinProb!
W[2, 1] = ww.IntMeanErr2sigma / 2.0 * SigmaLevelOut
If SigmaLevelOut = 1
W[2, 2] = "1-sigma err. of mean"
Else
W[2, 2] = "2-sigma err. of mean"
End If --SigmaLevelOut = 1
ElseIf ConstExtErr = TRUE
W[1, 1] = ww.ExtMean
W[1, 2] = "Wtd Mean (using external err)"
If SigmaLevelOut = 1
W[2, 1] = ww.ExtMeanErr68
W[2, 2] = "68%-conf. err. of mean"
Else
W[2, 1] = ww.ExtMeanErr95
W[2, 2] = "95%-conf. err. of mean"
End If --SigmaLevelOut = 1
W[3, 1] = ww.Ext2Sigma / 2.0 * SigmaLevelOut
Else --i.e. If (ww.Probability <= 0.05) AND (ConstExtErr = FALSE)
If SigmaLevelOut = 1
W[2, 1] = ww.IntErr68 --Note: NOT IntMeanErr68!
W[2, 2] = "68%-conf. err. of mean"
Else
W[2, 1] = ww.IntMeanErr95
W[2, 2] = "95%-conf. err. of mean"
End If --SigmaLevelOut = 1
End If --ww.Probability > 0.05
End With
If PercentOut = TRUE
W[2, 1] = ABS( W[2, 1] / W[1, 1] * 100.0 )
W[2, 2] = W[2, 2] & " (%)" --appends " (%)" to pre-existing caption
If ConstExtErr = TRUE
W[3, 1] = ABS( W[3, 1] / W[1, 1] * 100.0 )
W[3, 2] = W[3, 2] & " (%)" --appends " (%)" to pre-existing caption
End If --ConstExtErr = TRUE
End If --PercentOut = TRUE
If ConstExtErr = TRUE
W[4, 1] = ww.MSWD
W[4, 2] = "MSWD"
W[5, 1] = Nrej
W[5, 2] = "rejected"
W[6, 1] = ww.Probability
W[6, 2] = "Probability of fit"
Else
W[3, 1] = ww.MSWD
W[3, 2] = "MSWD"
W[4, 1] = Nrej
W[4, 2] = "rejected"
W[5, 1] = ww.Probability
W[5, 2] = "Probability of fit"
End If --ConstExtErr = TRUE
If CanReject = TRUE
rj$ = "" --Defines empty string
If ConstExtErr = TRUE --Define the row of W in which the list of rejections will appear
j = 7
Else
j = 6
End If --ConstExtErr = TRUE
If Nrej > 0 --Create comma-separated string containing index #s of the rejected points
For i = 1 To N
If ISEMPTY( Wrejected[i, 1] ) = FALSE --subroutine WeightewdAverage created Wrejected
rj$ = rj$ & ", " & STR( i ) --STR converts i to string
End If --ISEMPTY( Wrejected[i, 1] ) = FALSE
Next i
W[j, 1] = rj$
W[j, 2] = "rej. item #(s)"
Else
W[j, 1] = ""
W[j, 2] = ""
End If --Nrej > 0
End If --CanReject = TRUE
End Sub
Recall that the top-level invocation of the Isoplot3 WtdAv function is:
WtdAv ( ValuesAndErrors, PercentOut, PercentIn, SigmaLevelIn, CanReject, ConstantExternalErr, SigmaLevelOut )
so there are up to six arguments to be specified, beyond ValuesAndErrors.
1. In the context of initial (automated) WtdMeanAcalc, the VBA invocation verbatim is:
wW = Isoplot3.wtdav(Arange, True, True, 1, Not NoReject, True, 1)
so PercentOut = PercentIn = TRUE, SigmaLevelIn = SigmaLevelOut = 1, ConstantExternalErr = TRUE.
I think NoReject (and its opposite, CanReject) are specified by the user.
2. In the context of REDO of a WtdMeanAcalc, the VBA invocation verbatim is:
wW = Isoplot3.wtdav(InpR, True, True, 1, False, True, 1)
The difference here (relative to automated initial execution of WtdMeanAcalc) is that CanReject = FALSE, because the user has had the opportunity to manually reject and un-reject points from the population, so it is important that the Redo of the calculation does not independently perform the automatic (albeit deficient) rejection routine enabled by CanReject = TRUE.
3. In the context of manual/redo weighted mean AGE calculations for Samples (not Reference Materials), as executed by SQUID 2.50 is subroutine ExtractGroup, the VBA invocation verbatim is:
wW = Isoplot3.wtdav(InpR, False, False, 1, False, False, 2)
so PercentOut = PercentIn = FALSE (because the age calculations), SigmaLevelIn = 1 (because that's the levelk at which the uncertainties of the constituent analyses are held), SigmaLevelOut = 2 (because mean ages should be quoted at 2sigma/95% confidence as far as possible), and ConstantExternalErr = FALSE (because statistical dispersion in an unknown population is usually assumed to be geologically significant, rather than a consequence of underestimated errors).
4. In the context of the SQUID-specific function sqWtdAv, which itself is invoked as:
sqWtdAv ( Values, Errs, PercentErrsIn, PercentErrsOut, CanReject )
Isoplot's WtdAv function is invoked via:
Set ValuesErrs = Union(Values, Errs)
w = Isoplot3.wtdav(ValuesErrs, PercentErrsIn, PercentErrsOut, 1, CanReject, True, 1)
This is a bit unusual because the SigmaLevelIn and SigmaLevelOut are both hard-coded to 1, which is different to everything else except WtdMeanAcalc (the latter being a completely different scenario where that treatment is justified).
Those are all the invocations I could easily find in SQUID 2.50 and Isoplot... if something obvious is missing, please let me know.
Jim @bowring , the long code-block in the post from 18 March is probably the key one, in terms of "translating" values calculated by the WtdAv arithmetic into values portrayed by Ludwig as output, especially in the context of subroutine WtdMeanAcalc, and you might not need to look further. The next post after that has the settings for all the Booleans and SigmaLevel variables accompanying the invocation of WtdAv by WtdMeanAcalc.
Postscript regarding CanReject: there is NO dependency on sort order (that I could find) when determining which spot(s) should be rejected:
Clearly I did not eyeball the code correctly in my previous posts, and I have gone back to edit that commentary, to avoid re-stumbling across it later.
Hi Jim @bowring , I have finished looking through the WeightedAverage (and overlaid) code, but it is complicated because there are lots of ways into it, both from Squid (e.g. using WtdMeanAcalc for calibration constants, or sqWtdAv in user-defined expressions) and in Excel, from Isoplot (using the array function in a spreadsheet, or the toolbar-based function).
In addition to these various entry-points, we know that the array output of the calculation is conditional (with respect to both the numerical values reported and the labels assigned to them). Some of the conditions are dictatable by the user at the outset of the calculation, but the function-layering renders these relationships opaque. I am going to try to tease it out here (in instalments, and skipping over as much of the Excel-specific guff as I can). Note that in Excel, pretty much all of the below is part of the Isoplot codebase, not SQUID 2.50.
As a preface, function sqWtdAv is defined as:
sqWtdAv ( Values, Errs, PercentErrsIn, PercentErrsOut, CanReject )
where:
Values
is a mandatory 1-column range of numeric values.Errs
is a mandatory 1-column range of numeric uncertainties associated with the Values (and of the same length as Values).PercentErrsIn
is an optional Boolean, permitting the user to specify whether the Errs values are percentage (TRUE) or absolute (FALSE). If not specified, it defaults to FALSE.PercentErrsOut
is an optional Boolean, permitting the user to specify whether the uncertainties output by the weighted mean calculation (error of the mean, with or without "external" error) should be specified as percentage (TRUE) or absolute (FALSE). If not specified, it defaults to FALSE.CanReject
is an optional Boolean, permitting the user to specify whether Isoplot should be permitted to try excluding one or more outliers from the submitted set of Values and Errs, if the initial weighted mean calculation yields a probability-of-fit < 0.05. (Isoplot has its own rules for this process: no more than 15% of the total population can ever be rejected, and although rejection tolerance commences at 2sigma, the tolerance loosens with each successive pass. More on this later.) If not specified, it defaults to FALSE.sqWtdAv is unique in that it requires Values and Errs to be specified as separate vectors, and the sqWtdAv code then explicitly defines
ValuesErrs
as the unified 2-column array. This is to bring the input into line with all other invocations of the "starter" Isoplot function WtdAv, which I will describe next.