Closed gdcumming closed 4 years ago
Okay, so I've made progress on the functionality, but I haven't gone through the above yet. There is no heap as yet.
However, you may want to play with adjusting things on the fly as it were. I'll work through the above list tomorrow and then add in the heap code. Please remember it is a first go, there may be things that aren't right.
For you to check though is the code for the Fisher transform
Note r is the obtained r from the sample.
function calculateFisherrtozTransformation() {
let zr = 0.5 * Math.log( ( 1 + r ) / ( 1 - r ) ); //Math.log is the natural logarithm
//let cv = jStat.studentt.inv( alpha/2, N1 - 1 ); //critical value
let cv = Math.abs(jStat.normal.inv( alpha/2, 0, 1)); //critical value e.g. 0.05 gives 1.96
lowerarm = zr - ( cv * 1 / (Math.sqrt(N1 - 3)) );
upperarm = zr + ( cv * 1 / (Math.sqrt(N1 - 3)) );
//now transform back to r values
lowerarm = (Math.exp( 2 * lowerarm) - 1 ) / (Math.exp( 2 * lowerarm) + 1 );
upperarm = (Math.exp( 2 * upperarm) - 1 ) / (Math.exp( 2 * upperarm) + 1 );
}
I had a couple of questions.
Why isn't the Student t critical value used? Since we are using a sample of N1 items (don't ask why N1, There was a reason, but now I can't be bothered changing it! ;)
Does r -> zr need transforming back to r?
0.0.11
So good to see the first parts of dances working.
CI on r: We need z not t because (1) that's what the approx CI is based on, that's what the Fisher transform is defined to be and, relatedly, (2) the t arises for CI on mean because that CI depends on an estimate of sigma, the pop SD, and the estimate we use is s, the sample SD, which has (N-1) d.f. (for single group case). There is no such need for r to rely on any such SD estimate. That also simplifies things in that we don't need to support the options of sigma known/unknown.
No need to transform from zr back to r. If we did so we'd simply get the original r anyway. In Figure 14.7 the lines from r to zr (not labeled but on the horiz axis) are bidirectional, but the lines for the two ends of the CI are used only in the backwards direction, from the H to the V axis.
Note that your 'lowerarm' and 'upperarm' refer to the lower and upper endpoints, or limits, of the CI. Not, for example, to arm length. All the formulas above seem to use them correctly, and there's no need to change the labels you use, but don't sometime in the future get misled by the labels.
There are some things in the current version that don't work quite right. But I'm not going to try to describe those, because I'm sure it's a work in progress and your next steps will no doubt sort out lots if not all of that. I am however going to start a new issue about how 'Clear' and related things should work, because I don't think I defined that sufficiently in the spec above.
When ‘r heap’ is ON, dropping r values collect into the heap, just as in ‘dances’ for means.
To be done
A blue vertical line can be turned on to mark rho, the population correlation. When it’s on, then capture can be turned on, which changes the colour of r dots and CIs, both while dropping and when in the heap. Just as for means and CIs in ‘dances’.
Implemented
The number of samples taken in the current run is reported (Panel 10), also the percent of these for which the CI captures rho.
Implemented, BUT at the moment I work this out regardless of whether rho line and Display CIs selected. Maybe it there is a direction further down the list?
On just a few runs I note that it takes a large number of trials to get close to the 95% CI value?? i.e Slow to "approach" (there's a better word, but I can't just think what it is (calculus term?)
Defaults
Done
Panel 7 When clicked ON, SCA, meaning: S: sampling stops if it was running C: reset samples count to 0, i.e. prepare for a new run. All r values and CIs and heap are cleared. Sample shown in scatterplot >> is cleared. A: act on this control, meaning that: Lower display area appears, Panels 8, 9, 10 appear, all checkboxes OFF, C = 95% is selected.
I think I've done this ;)
When Panel 7 is ON and we see lower display area: This is the tricky bit. My suggestions below are a bit more conservative than in ‘dances’, hoping this will make life simpler all round.
CI clicked ON or OFF: Just A (CIs are displayed or not. Keep counting captures in the current run, but display percent capturing only when CIs are being displayed.) Keep displaying count of number of samples taken, whether or not CIs are being displayed.
Ahh here we go, the answer to an earlier query. Implemented, also display as '-' if CI off I also keep a count of those captured regardless of whether displayed
C% changed: SCA
Done
rho line clicked ON: Just A rho line clicked OFF: ‘rho line’ disappears and, if ‘Capture of rho’ is ON, turn it OFF, which triggers SCA
Capture of rho not yet implemented, but logic for turning off added.
‘Capture of rho’: This can be clicked ON only when ‘rho line’ is ON, in which case: SCA
It's not a problem to recolour based on parameters. The samples are stored in an array and I can go through the array as needed.
If clicked OFF, than SCA
Done, but don't see why rho line needs to be on? If capture or rho on then could colour the blob or, if CI on, the wings.
‘r heap’ clicked ON or OFF: SCA
Done, but heap not yet implemented.
Okay, in order for me to keep up I'm closing this spec.
Can you now open issues for individual issues please so that I can address them one by one. Thanks.
0.0.9
General idea Scatterplot in square upper display area: Working well.
When lower display area is open, then when a sample is taken, the r value appears as a dot at the top of this area, positioned along the horizontal r axis. Then it drops down when next sample taken. Like means in ‘dances’. That gives ‘dance r’. When CIs are turned on, the CI is shown on each r dot and we get the dance of the CIs for correlations. Those CIs are generally asymmetric, and more so as r approaches -1 or 1. Shorter arm is the arm closer to -1 or 1.
When ‘r heap’ is ON, dropping r values collect into the heap, just as in ‘dances’ for means.
A blue vertical line can be turned on to mark rho, the population correlation. When it’s on, then capture can be turned on, which changes the colour of r dots and CIs, both while dropping and when in the heap. Just as for means and CIs in ‘dances’.
The number of samples taken in the current run is reported (Panel 10), also the percent of these for which the CI captures rho.
Defaults Whenever the’ dance r’ component is first loaded, or we arrive there from the esci web main menu: Panel 1. N = 50 Panel 2. Rho = .50 Panel 4. Population ON, other two checkboxes OFF Panels 5, 6, 7. All OFF Scatterplot display: We see one sample, whose r value is reported in Panel 2.
Panel 5: Whenever this is ON we see the statistics for the latest sample. Panel 6: When clicked ON, see three checkboxes all OFF When clicked OFF, any displayed lines disappear.
Panel 7 When clicked ON, SCA, meaning: S: sampling stops if it was running C: reset samples count to 0, i.e. prepare for a new run. All r values and CIs and heap are cleared. Sample shown in scatterplot is cleared. A: act on this control, meaning that: Lower display area appears, Panels 8, 9, 10 appear, all checkboxes OFF, C = 95% is selected.
When Panel 7 clicked OFF, SCA One extra sample is taken and displayed in scatterplot
When Panel 7 is ON and we see lower display area: This is the tricky bit. My suggestions below are a bit more conservative than in ‘dances’, hoping this will make life simpler all round.
CI clicked ON or OFF: Just A (CIs are displayed or not. Keep counting captures in the current run, but display percent capturing only when CIs are being displayed.) Keep displaying count of number of samples taken, whether or not CIs are being displayed.
C% changed: SCA
rho line clicked ON: Just A rho line clicked OFF: ‘rho line’ disappears and, if ‘Capture of rho’ is ON, turn it OFF, which triggers SCA
‘r heap’ clicked ON or OFF: SCA
‘Capture of rho’: This can be clicked ON only when ‘rho line’ is ON, in which case: SCA If clicked OFF, than SCA
Colours Blue for the individual data points in the scatterplot is fine. For r values and CIs on r, let’s use the same two greens, and red, as ‘dances’ uses for means. Let’s have the thin black circle on each red or green r value dot, as in ‘dances’.