gfmoore / esci-precision

esci precision two and precision paired
GNU General Public License v3.0
0 stars 0 forks source link

Calculated values of N #6

Closed gdcumming closed 3 years ago

gdcumming commented 4 years ago

0.0.6

Comparing values of N given by ESCI intro, and your 'esci-precision'.

Two Independent Groups tab

Exact agreement (yay!) for 'on average' and 'with assurance' for all the cases that ESCI calculates, including non-displayed values out to target MoE = 2.0.

Paired Design tab

I compared for four values of rho.

rho = .00

'on average': Agreement for small target MoE, then ESCI is 1 or 2 higher for many target MoE values of 0.85 or greater.

'with assurance': Agreement until target MoE = 1.7+ when ESCI is sometimes 1 higher.

rho = .50

'on average': Agreement for small target MoE, then ESCI is 1 or 2 higher for many target MoE values of 0.6 or greater.

'with assurance': Agreement until target MoE = 0.95+ when ESCI is often 1-3 higher.

rho = .90

'on average': Agreement for small target MoE, then ESCI is 1 or 2 higher for many target MoE values of 0.3 or greater. For target MoE = 0.6 or more, ESCI = 5 or 4 and esci-precision = 3.

'with assurance': Agreement until target MoE = 0.45+ when ESCI is often 1 or 2 higher. For target MoE = 0.7 to 1.3, ESCI = 7, 6 or 5 and esci-precision = 4.

rho = .99

'on average': Even for target MoE = 0.1 to 0.35, ESCI is 1 or 2 higher. For target MoE > 0.35, ESCI and esci-precision both give 3.

'with assurance': Agreement for target MoE = 0.1 and 0.15, then ESCI is 1 or 2 higher for 0.2 - 0.4. Agreement (both give 4) for 0.45 to 0.75. Then for 0.8 to 2.0, ESCI gives one less (i.e. 3 rather than esci-precision's 4).

Conclusions?

My memory is that I was able to verify a small number of my 'on average' and 'with assurance' values against values published as examples in the journal articles that derived the formulas. The details are lost in the mists of time. So I'm moderately confident (no less, but no more) that ESCI is accurate. I'm hoping Bob might be able to suggest some way to validate values.

It strikes me as strange that esci-precision gives, for the 'with assurance' case, 4 as the minimum for many larger values of target MoE. E.g. when rho = .5, for target MoE = 1.6+ and when rho = .90, for target MoE = 0.7+. Even more extreme for rho = .99, i.e. for 0.25+. Never dropping down to 3.

So those small N values might be the first to investigate. Do you and I do something different for rounding, at some stage?

gfmoore commented 4 years ago

Do you and I do something different for rounding, at some stage?

Oh yes!! I honestly could not follow the logic for what is going on in your Excel code.

As I've said previously I am a bit concerned about the validity of some of these calculations? For instance there seems to be a switch to using studentt.inv and not chisquare.inv at certain values or if one is smaller than the other etc. It seems (to me) that some of the logic is contrived to give particular values and from your comment above about matching published values I think I may have a point. As I said I will shortly publish the details of my code for you and Bob to ponder and to offer suggestions.

One point (already mentioned I think) is that more modern statistical libraries seem to interpolate in studentt.inv and chisquare.inv for non integer values of N/df,, whereas tinv and chiinv do not do this (I think) but I don't know what the implications of this are. I'm sorry, but my mathematics is limited at this level.

I think we should be able to derive an algorithm for calculating the correct values, that gives the theoretical values and then look at we can implement these in code.

As soon as I finish the MoE distribution curve calcs (which also need validating) I shall email you both with my algorithms. Then we can tweak!

It strikes me as strange that esci-precision gives, for the 'with assurance' case, 4 as the minimum for many larger values of target MoE

Looking at my code, I have put a limit here of 4 as I think the algorithm suddenly diverges if less - I even wrote in the code, is this a fiddle!

N = 2 (1 - correlationrho) (cv/fmoe) (cv/fmoe) ( Math.abs(jStat.chisquare.inv(gamma, df)) ) / df;

It was last week when I wrote this, but I'm thinking when df is 3 it blows up. In fact I've just checked what happens without the check and the code crashes. Hence the need for a thorough review of the algorithm please.

gfmoore commented 4 years ago

I've emailed you both my algorithms. Also put here for completeness. Precision for Planning algorithms.docx

gdcumming commented 4 years ago

0.1.0

I've been thinking about this these last few days, partly trying to decipher ESCI code to get greater understanding, partly comparing your and my values, and partly going back to the formulas in UTNS. Thanks for the doc with all your code and reasoning. I have 3 lines of thinking to describe.

Given that our N values all seem to agree for the two indep groups case, maybe we can consider that done, at least for now. So here I'll discuss just the Paired Design, and 'on average'. Relevant formulas are 13.10 (for a start, with z) and 13.11 for the main work (UTNS p. 363).

The nine steps in ESCI. In words, the rationale was:

  1. Use 13.10 to get an N0, and round up to integer. That's the value we'd want if sigma known. It's a lower bound on the N we actually want.
  2. Using df=(N0-1), apply 13.11 to get (after rounding up to integer) N1. Typically, N1>N0. This N1 is an upper bound of the N we want.
  3. If N1>N0, let's try N=N0+1, so use df=N0+1-1, apply 13.11 etc.
  4. Keep going, ... either stabilises, or alternates between two values 1 apart.
  5. The N we want is that stable value, or the greater of the two values that alternate.

I believe that's an informal description of what is calculated in the table that has Z66 in its top left corner (ESCI intro, Precision paired sheet). Even more informally, find N for z, see what N the formula 13.11 gives. Nz usually too small, so try 1 more, etc until we get N just big enough to give us satisfactorily small av MoE.

Comparing with your code, as best I understand it: Need to round up N to next integer, which means df is always an integer. If get alternation, choose larger value, because the smaller value gives a 13.11 result that's too large.

I conjecture that if you tweaked your code in those two ways, your values would match those in ESCI. Just a conjecture!

Here's a totally different approach. Sledge hammer, but maybe how I should have done it from the start? Fiddle with formula 13.11 to get:

f = t(.95,N-1) / sqrt(N/(2*(1-rho)))

Simply (!) make a giant table of f values for all N values and rho values we are interested in. I've done that, for rho = .00 (.01) .99, and N up to 500 (easy to extend). I'll try to insert below, also send separately. To use the table, look down the column for the rho of interest, and find the row (N value) that gives f just below target MoE. That looking up of values could be automated.

I've tried that for rho=.50 and found the values given by ESCI, accurate for all target MoE values that ESCI calculates.

The file should open with the .50 column adjacent to N and df columns. Scroll horizontally for different rho values.

I suspect a similar approach would work for 'with assurance', using formula 13.12 (i) changed for the paired case, then (ii) fiddled to give f = ...

Trial precision for planning calcs 27 Oct 20a.xlsx

gfmoore commented 3 years ago

When you say "round up to integer" do you mean strict +1/2 rounding e.g. 8.4 => 8 amd 8.6 => 9?

gfmoore commented 3 years ago

Lightbulb!

I see what you are doing now. You are not using the calculated value of N in the next iteration, you are adding 1 to N and seeing what happens.

Okay I'll rewrite the code and see how that works out.

gfmoore commented 3 years ago

If N1>N0, let's try N=N0+1, so use df=N0+1-1, apply 13.11 etc. Keep going, ... either stabilises, or alternates between two values 1 apart.

When you say "Keep going", what do you mean? What are the new values to try?

[2 hours later] So I've spent a few hours reviewing your Excel code and I think I've discovered the bit that makes me very uncomfortable (sorry) Its the calculation of the degrees of freedom

//This is for targetmoe i.e. f = 0.9 (I chose this as this is a point where I get to be different from Excel) df2 =MAX(MIN(ROUNDUP(AD83,0)-1,AC83+1),AC83-1)

                    =MAX(
                               MIN(
                   ROUNDUP(Nt1,0)  - 1,
                   or
                                  df1 + 1
                              )
                             or
                            df1-1
                          )

              = MAX(
                  MIN(
                    14 -1 = 13
                    or
                    2 + 1 = 3
                  )
                  or
                  2 – 1 = 1
                )
        = 3

So this is weird, but you get:

df2 = 3 nt2 = 7.502195914221385 df3 = 4 nt3 = 5.7101091732890525 df4 = 5 nt4 = 4.894734051273621 df5 = 4 nt5 = 5.7101091732890525 !!!!!!!!!!!!!!!!

and this is where it goes weird because of your max, min calculation.

Now we start the oscillation, but really this is because of the way we are picking the df which seems (to me) to be a little arbitrary?

If I was to be following how I understand what you say the algorithm above does ("Keep going"), then df should keep on increasing? so df5 should equal 6.

In my (naive) view for the algorithm I produced, I imagine that we are zooming in on N, so I keep feeding whatever my latest N is back into the calculation for N - an iterative process that I hope converges. (Sometimes it doesn't, but I haven't totally analysed why - I think N is too small)

I guess you don't have the original papers that you mentioned? It would have to be the papers as I don't have access to Athens etc now.

Sorry about this. I guess I could simply implement your calculation of df, but I won't be satisfied till I understand it.

I will have a look at the sledgehammer soon, but personally I'd like an algorithmic approach rather than a look up table.

Knowing me, I'll probably implement both and we can switch between the two for a test :)

gfmoore commented 3 years ago

I've changed the calcs using your df evaluation from Excel - they now seem to match, but not checked everything. This is just for the Paired calcs.

I will now start to change the Unpaired code to follow the same pattern.

However, I once again state my unease at the calcs πŸ˜ƒ ooh I found how to do emoticons. You type in :sm... and pick

gfmoore commented 3 years ago

Ok, so now implemented the Excel calcs for Independent groups - I think. I can't guarantee that I've got it right, but at least it seems to match. (Not checked the different rho for Paired design.

However, I still feel very uneasy about the rounding up or down of Nt, df. Seems very arbitrary, though I guess there is good reason for it.

Updated calcs for Precision for calculating N.docx

gdcumming commented 3 years ago

0.1.3

I'm not sure I quite follow all that, sorry. Sorry also that I haven't been able to explain my iteration approach well--not sufficiently well for either you or me.

In your latest version I checked just rho = .50 values and see that now your values on average and with assurance match mine. πŸ˜ƒ

After another day of rumination, here's my latest take on the basic approach:

We can easily calc Nz, which is what we want when sigma assumed known. It's the minimum N that gives, on average, MoE not greater than target MoE. The 'not greater than' (which we use neurotically always in the text discussions in the book) arises because Nz has to be an integer, so there's some rounding up, almost always. So, almost always, that Nz actually gives average MoE that's a little less then target MoE.

We know that, when sigma is unknown and we need to use t not z, the Nt we seek will usually be larger than Nz, because critical t is more than critical z. It's striking, however, that Nt is never greatly more than Nz. Play with ESCI for UTNS (old ESCI), which gives a black curve for Nz and red for Nt. The two values are never different by more than 3. I think that's the case for all cases presented in either the ind groups or paired sheets.

Given N, we can use the formulas in the book (maybe turned around) to calculate what f (i.e. average MoE) that N will give.

Therefore, to find Nt:

  1. Use target MoE to calculate Nz. (For paired, use 13.10.)
  2. Take Nz = N0 as our first attempt at finding Nt. Calculate f0, the average MoE given by N0. (For paired, use turned around version of 13.11) If f0 <= target MoE we're done. Use N0 as Nt.
  3. If, as usual, f0 > target MoE, make N1 = N0 + 1. (Try a slightly larger N.)
  4. Calculate f1, which will be shorter than f0. Compare f1 with target MoE...
  5. Etc. Should not need more than 3 steps, because we expect Nt will be no more than 3 greater than Nz.

That seems to me simpler than what I see in ESCI. It's what I have always had in mind, so I can't figure out why I didn't use this apparently simpler form of the iteration. I don't see how the above approach could give oscillation between two adjacent values of N. It also make clearer, I think, why we are always working with integral N, and therefore integral df.

Note that for Step 1 we use f = target MoE, a known constant, in 13.10. But when in Step 2 we use 13.11 turned around, the f is a variable that we calculate a value for--maybe up to three times.

Now I'm going to do a bit more work on the sledge hammer.

Sleep well...

gdcumming commented 3 years ago

0.1.3

I've added to yesterday's sledge hammer Excel file the lookup needed to give N on average for paired design, for rho = 0 (.01) .99 and target MoE = 0.05 (0.01) 2.00. Those N values are in a table approx 100 columns wide (rho values) by 196 rows (target MoE values).

Below that table is the larger table that calculates MoE on average for a vast range of N, for all those rho values.

I have compared, for a haphazard selection of cases, the N given by that file with your values (in 0.1.3) and my Excel intro values. All match! I haven't checked every case...

See 'on average' file below.

I've also adapted a copy of that Excel file to do the same for N with assurance. (A separate file, because they are each so large.) Same layout.

I have compared a haphazard selection of cases. All those match with my Excel intro values. I found one case of disagreement with your values: rho = .90, target MoE = 1.4 and higher. You get N = 5, I get N = 4 (both in Excel file and ESCI intro.)

See 'with assurance' file below.

I'll email the two files as well.

Precision for planning Paired Design on average calcs 28 Oct 20.xlsx

Precision for planning Paired Design with assurance calcs 28 Oct 20.xlsx

gfmoore commented 3 years ago

I did try this approach, if I interpreted your algorithm correctly, and hence why I asked about it, but I did not get the same values as with the Excel approach. Hence I forced myself to decrypt the Excel code in detail. I'm glad you are finding general agreement, but you have to remember that I am not using the chiinv or tinv algorithms from Excel, but the T.inv and chi.inv algorithms in jStat (which I assume! correspond with the Excel versions. I suspect that there may be slight differences - some references do talk about a "better"algorithm with t.inv!? I'll look into the discrepancy :sigh. to see if I can work out why.

I will now have a go at the sledgehammer approach. I'll stick a temporary radio button in so we can see how it works out.

gfmoore commented 3 years ago

Uhmm, I've now re-read more thoroughly your algorithm description and this is what I did not do, so ignore my comment about getting different values, but I like it. It is a lot cleaner and more understandable. I will have a go at implementing it - so I'll use three radio buttons. Though it has been a difficult process, I think what we have gone through has focused the mind and will make for a much more satisfying discussion in your new book version πŸ˜„

So I've got a few hours before the Vuelta starts .... What will I do when that has finished. Maybe pick up my guitar and practice properly. Maybe learn a tune for a change πŸ˜„

gfmoore commented 3 years ago

I have compared a haphazard selection of cases. All those match with my Excel intro values. I found one case of disagreement with your values: rho = .90, target MoE = 1.4 and higher. You get N = 5, I get N = 4 (both in Excel file and ESCI intro.)

I have discovered a missing roundup (ceil), what does this look like?

gfmoore commented 3 years ago

Here's a thought though, would you advocate using such a low value of N for any research??

gfmoore commented 3 years ago

0.1.5 Ok, so I've now done the calcs for iterating using your new approach for on average. Seem to match up well and SO much simpler.

However, I can't figure out the assurance calcs. I don't know how to calculate the DF for the chi-square and you do not have the formula for unpaired or paired, just the single case calc (13.12)

I've tried adapting from the excel, but can't get close.

I'll leave it for now and study the sledgehammer, perhaps it will give me the insight I need.

gfmoore commented 3 years ago

Found error in paired average. Added assurance calcs for iteration - gained insights from sledgehammer.

I think, that the iteration is in fact the sledgehammer. Apart from implementation there is no difference in calcs - although I couldn't find the unpaired with assurance??? Did I miss it?

Is there any point in implementing the sledgehammer now?

gfmoore commented 3 years ago

I can't help myself. Added code for sledgehammer approach. Creating the huge arrays takes a very long time, so please be patient.

I know there is a slight issue with the iterated value on - I'll check it out later.

When you are satisfied that the sledgehammer approach doesn't work, I'll disable Excel and Sledgehammer.

πŸ˜„

gfmoore commented 3 years ago

Oops I meant the problem was on iterate for Paired design for Target Moe at 1.9

gdcumming commented 3 years ago

0.1.7

Thanks for all that, tho' I confess I find it hard to follow exactly what each of those messages above refers to!

I've spent today revising ESCI intro to implement what you are calling iterate. I agree that it's neat and satisfying, and that, basically, it's the same as sledgehammer. Except that iterate needs to start with a fairly close low estimate of the N we want. That's easy for two independent groups, and I easily revised that sheet of ESCI intro, to get values that I think agree in every case with previous ESCI and also with all three options in 0.1.7--which my (inevitably incomplete) explorations suggest agree in all cases among themselves. Great!

For paired, on average, I could implement iterate as an revision of ESCI intro, but needed a kludge to make it work when Nz was small and defaulted to the minimum, 3. In other words, getting the close low estimate of the N we want was not, in this case, straightforward. But at least it was obvious when it failed! I.e. the succession of trial N values, each 1 more than the preceding value, started either with estimated initial N too high (so never would terminate) or way too low (so would have taken many steps to find the accurate N with assurance).

For paired, with assurance, I failed to find a way to calculate that close low estimate of the N we want. I couldn't figure out how my original ESCI intro got around that problem. Alas! (At least, once again, it was obvious when the method wasn't working.) I've given up on this unless (i) you think it's essential and (ii) I have some inspiration--which I doubt will happen.

Playing with 0.1.7, paired, there are various disagreements amongst your three options, esp. for rho large or very large, and esp. at large target MoE. I don't get a good sense of which of the three seems best.

If sledgehammer works as intended, it may be the most likely to ensure our answers are correct, tho' it would be good to find a way to avoid such a vast and massive couple of tables. Getting iterate to work would be very satisfying!

There's also an issue with 0.1.7 that sometimes sledgehammer gives a horizontal curve (actually two curves, for both on average and with assurance) at N = 3. I can't pinpoint reliably when that happens, but I think it's often triggered by changing rho by dragging the round thumbnail rather than by using the nudge buttons.


Yes, I haven't provided the full set of equations, in particular for two ind groups and paired, for assurance. Sorry about that.

Two ind groups, with assurance: Adapted from 13.9 and 13.12: N = 2 [tinv(.05, 2(N - 1)) / f]^2 chisq.inv(.99,2(N - 1)) / (2 (N - 1)) (I believe tinv(.05,df) == t.inv.2t(.05,df).) Turn this around: f = tinv(...) sqrt((chisq.inv(...)) / (N * (N - 1)))

Paired Design, with assurance Adapted from 13.11 and 13.12: N = 2(1 - rho) [tinv(.05,(N - 1)) / f]^2 chisq.inv(.99,(N - 1)) / (N - 1) Turn this around: f = t(...) sqrt((2(1 - rho) chisq.inv(...)) / (N * ( N - 1)))


Just for the hell of it, I tweaked the two ESCI Paired Design sledgehammer files so they go down to N=2 minimum. For 'on average' we get N=2 only for rho = .99 and targ MoE > 1.27 and for rho = .98 and targ MoE > 1.79. For 'with assurance' we don't get N=2 ever.

gfmoore commented 3 years ago

tho' I confess I find it hard to follow exactly what each of those messages above refers to! 🀣

The iteration code is just so sweet and easy to follow. No mucking around with weird roundings etc.

I've attached the code and also extracted the calculations for thorough checking. I'm checking against your equations above and cannot see any problem, but it's easy for a small issue to get through...

Note: The Javascript library only has jStat.studentt.inv(alpha/2, df-1) (I take the absolute value) which is (I hope) equivalent to your T.inv.2t(...)

Personally, I think, in the new edition you should state all 4 sets of options: equations and the iterations,

Except that iterate needs to start with a fairly close low estimate of the N we want

What's wrong with the Paired, assurance N0, = Math.max(Math.ceil( 2(1 - rho) (cv/fmoe)**2), 3); //cv = normal critical value 1.95...

For paired, with assurance, I failed to find a way to calculate that close low estimate of the N we want. I couldn't figure out how my original ESCI intro got around that problem. Alas! (At least, once again, it was obvious when the method wasn't working.) I've given up on this unless (i) you think it's essential and (ii) I have some inspiration--which I doubt will happen.

I don't understand? Is it possible that this is where you replaced the chi square calc with a student t calc? never understood what that part of the esci code was about.

I have commented out the code for Excel and Sledgehammer (but not removed).

Sledgehammer uses very large arrays (about a million entries I recall) and the search for values is not as efficient as VLOOKUP in Excel. We have to think of the resources (memory) being used when folk are using tablets or even phones to view.

I think we should concentrate on iteration and if we find anomalies we can fix each one and write explanations of what we have done. Might make for interesting reading as well.

0.1.8

p.s. did you decide if you wanted the CI drop-down - it's all there and ready if you want it. :)

gfmoore commented 3 years ago

Forgot the code. Precision by iteration code.docx

gdcumming commented 3 years ago

0.1.8

I was off air yesterday, but still ruminating. I decided to revise how I built iterate into ESCI intro. My problem was finding a good low estimate of target N. For 'on average', no problem. But with assurance I'd either need up to maybe a couple of dozen iterations. It occurred to me I could use the first step of what I had before in ESCI intro, with that mysterious possible use of t--which I suspect came from some complicated formula in a journal article that's lost in my mists of time. I've done that today and it all works well. I've compared a very wide range of values with (i) old ESCI intro, and (ii) your 0.1.8. Everything spot on, at least for the values that ESCI intro calculates. You, of course, give a wider range and finer grain possibilities for targ MoE (and rho).

To the best I can follow your formulas, everything looks good to me. You use a perfectly sensible initial estimate of targ N, and the way you do it, it doesn't matter at all that there may be up to a couple of dozen iterations.

So I think the original ESCI, my sledgehammer, my latest iterate in ESCI intro (new today), and your nifty iterate 0.1.8 all agree. So we're done! ---For calculating N for precision for planning, which is the main game here. Well done us, esp you!

Still to do: -- I need to consult with Bob about decisions on fixed 95?, grain size for targ MoE? and rho? and maybe one or two more things. --I need to study the MoE distribution and curve that you've built --tips

I'm on to that, on Monday.

Meanwhile, closing... wheeeeee!

gdcumming commented 3 years ago

One further thought overnight. Given that N=2 in a few cases (paired, assurance, rho=.99, large targ MoE), if we remove the min 3 restriction, wouldn't your iteration go on for ever in such cases? Mine stops because I only do 6 iterations, and all other cases stop after 2 or 3 or, at most, 4. Then I take the max of all 6. So I get 3, but no infinite churning.

Very likely I've missed something in your code.

gfmoore commented 3 years ago

I'v removed the minimum is 3. Doesn't seem to be any adverse effect? But a sample size of 2!!! I think I mentioned this before πŸ˜„

gdcumming commented 3 years ago

0.1.12

Even with N=3, the main reason for including such a small value is to be able to discuss how the relationships work, esp. the N and targ MoE relation. We suggest in ITNS that very often it's not so much a matter of 'here's my targ MoE, so what N shall I choose?', but, rather, some 'what if' exploration of the N vs precision trade-off. Seeing that might suggest that we can't realistically expect a usefully precise result, or that we need to seek more funding so we can afford a usefully large N. Or seek replication by others so that meta-analysis of several results could give a useful overall result.

A while back I met a highly distinguished cell biologist, David Vaux ('Davo'), who was writing powerfully about statistical misconceptions and errors in the biological literature. I discovered that it was standard practice in cell biology to use N = 3, and to publish significance test results, and SE error bars, based on N = 3. Madness! Davo and I (and Fiona Fidler) eventually wrote a paper together describing the basics of estimation, error bars, and sample size. It became a hit, being cited often and studied in many 'journal clubs', despite being (maybe because it was) so very basic. https://tiny.cc/errorbars101

Similarly, going down as low as targ MoE = 0.05 is unrealistic, but does let us explore the extremes of the N vs targ MoE relationship.

I'm delighted to see (tho' I expected nothing less!) that the transitions from N = 3 to N = 2 occur in your pfp at exactly the same points as in my sledgehammer Excel file. Of course!

The issue of minimum N is one of several that I still want to discuss with Bob. I know he's been totally overloaded with his (rather unfair) college obligations recently, plus trying to do some urgent political work. I'm happy to wait a bit. I hope that's ok with you. I expect (hope) there will be little, if any, that we'll request as mods to what you've done so far.