First sketch ideas, for discussion

gdcumming commented 4 years ago

In a nutshell: Like 'Sample r' page in ESCI (for UTNS), but better laid out.

Top half of RH display area: a square scatterplot, like that in 'correlation', where the sample appears. There is an option to display the population (well, several hundred points in the population) in the scatterplot, a bit like 'fill' in dances. Then taking a random sample of size N appears like the random selection of N points from those hundreds and making these points highlighted and coloured. Take successive samples and see subsets of highlighted dots dance around, and also the value of r (if we click to display it prominently in the scatterplot) bounce around,

Lower half of RH display area: dance of the r values dropping down. Options to show CIs on the r values, to collect the r values in a heap, to mark capture of rho using red for non-capture. That's all like 'dances' except that the dropping dots are r values, not sample means. The CIs are asymmetric.

Control panels in LH display area: same three buttons as in 'dances', many of the controls similar to 'dances', but no Panels 8 and 9.

Two possibilities to consider:

1. Simplest overall layout: RH display area split, with square scatterplot above, and area for 'dance of the r values' and 'r heap' below.

2. More complex but worth a thought: When dance is turned OFF, the square scatterplot takes up the whole RH display area--within constraint that it be square. When dance turned ON, the scatterplot shrinks (but remains square) so its height is half that of RH display area, and the dances area appears below.

Big issue: How would each of those work when overall window area is adjusted, e.g. by dragging RH edge in, or lower edge up?

gfmoore commented 4 years ago

In a nutshell: Like 'Sample r' page in ESCI (for UTNS), but better laid out.

Can't find "Sample r" ?

gfmoore commented 4 years ago

Found it!

gfmoore commented 4 years ago

I'm afraid you are going to have to draw some pictures of what you want here - sketches on the back of a napkin are fine - as I don't understand what you re trying to achieve here, sorry.

gdcumming commented 4 years ago

I hope you can see below a cobbled together rough pic of what I have in mind for the RH display area.

Basic aim: To illustrate the bouncing around of r values as we sample from a bivariate normal population, with correlation rho. In ESCI intro , the 'See r' page does this, when 'Sample from population' is selected at red 3. (Note that in ESCI, the earlier version for the first book, the 'See r' page doesn't have this option, but the 'Sample r' sheet takes a fuller approach to sampling r values.)

Scatter plot at top As in your 'correlation', but let's omit 'marginal distributions' option, to simplify a bit. I'd like to add a representation of the whole bivariate normal population, a bit like the left panel in Figure 11.15, p. 310, in ITNS. The dots could be small open circles, and when a sample is taken, the N dots in the sample could be same size as the population dots, but solid colour, like the means in 'dances'.

This scatter plot should always be square. It's height should probably always be about half the available display height.

Lower display panel Dance of the r values, dropping down the screen, and piling up to form the r heap. Lots of features to be similar to how 'dances' shows the dance of the means, mean heap, CIs, capture of the population value, etc.

Overall, I have in mind roughly the functionality of 'Sample r', with the two main display panels merged, so the r value dots drop directly onto the heap, as in your 'dances', and with the scatterplot more prominent and above the dances.

Control panel area would comprise, roughly, almost all the functionality of your 'correlation', plus some of the extra stuff in 'Sample r' that's needed to control what's displayed in the lower part of the RH panel, where the dances and heap are displayed.

First big question, as asked in my first comment, 9 days ago, above. I see two options:

Simplest overall layout: RH display area split, with square scatterplot above, and area for 'dance of the r values' and 'r heap' below. (as in pic below)
More complex but worth a thought: When dance is turned OFF, the square scatterplot takes up the whole RH display area--within constraint that it be square. When dance turned ON, the scatterplot shrinks (but remains square) so its height is half that of RH display area, and the dances area appears below.

Big issue: How would each of those work when overall window area is adjusted, e.g. by dragging RH edge in, or lower edge up?

I suspect Option 1 above would be better, assuming things don't get too tiny?

Calculating CI on r Use Fisher r to z transformation. Find CI, then transform back the two CI endpoints. The transformation is the inverse hyperbolic tangent function: z = arctanh(r)

P. 388 in UTNS explains. (Let's know if having a scan of that page, or the file that is that chapter, would be useful.)

The CI depends only on N and r, so there is nothing corresponding to 'assume sigma known/unknown'. No need for MoE lines, because red dots in the heap always fit neatly into the tails of the heap.

This is (obviously) not a full spec, but for initial discussion. Seem ok?

Could you send me a link to your initial go at 'dance r'. The dance r button at Main menu no longer points to it! Thanks.

gdcumming commented 4 years ago

I just said I wouldn't make any further comment for a while. Then I remembered that Bob a while back played with sampling of r with the option to turn on an image of the population as background. As below.

Ignore all detail. The idea is that the light coloured outline points (3,000 of them in this case) gives an impression of the population, a bit like 'fill' in dances. Then the points in our new sample 'pop out'. And successive samples 'dance' haphazardly while the pop, of course, never changes.

gfmoore commented 4 years ago

I've uploaded the starting attempt. It is basically correlation, but with the two panels.

Before I make any further progress would you check it out and make suggestions on these basics before I start dancing please.

I'm also watching the Giro d'Italia now!!! :) My decking is mostly done, but the rain has set in ;) so I'm slowly starting to get back to programming, must admit it's not easy to get going. :(

I'm not sure what you are wanting me to do with the nice picture above?

This program is completely new to me :)

gdcumming commented 4 years ago

'morning Gordon,

Great to hear from you and, at a quick squiz, the dance r looks extremely promising. I'll work through and reply on github.

Just looked up the Giro, which we're not getting on TV this year. French tennis instead, boo. But I see there are 8 Aussie riders, only about half of whom I had heard of. And Mitchelton Scott is an Aussie team of course.

Keep safe--looks like the UK isn't totally a covid-happy place at the moment. Melbourne is down to about a dozen cases a day, test-and-trace is improving after a slow start, kids are going back to school. So we're hopeful, tho' hiding under the doona when we think of the next few months in the U.S.

Cheers, Geoff

Geoff Cumming, DPhil, Emeritus Professor, School of Psychology and Public Health, La Trobe University, Melbourne Campus, Victoria, Australia 3086 Email: g.cumming@latrobe.edu.aumailto:g.cumming@latrobe.edu.au Intro textbook: Introduction to The New Statistics: Estimation, Open Science, and Beyond www.thenewstatistics.comhttp://www.thenewstatistics.com First book: Understanding The New Statistics: Effect Sizes, Confidence Intervals, and Meta-Analysis ww.thenewstatistics.comhttp://www.thenewstatistics.com/ Own page: http://www.latrobe.edu.au/she/contact-us/staff/profile?uname=GDCumming ESCI (Exploratory Software for Confidence Intervals): www.thenewstatistics.comhttp://www.thenewstatistics.com/ Introduction to the New Statistics is the first statistics textbook to focus on Open Science and the New Statistics. Instructors can obtain a free desk copy at https://www.routledge.com/resources/deskcopy. Order on Amazon http://www.amazon.com/dp/1138825522 [1464879986726_ITNS]

From: Gordon Moore notifications@github.com Sent: Tuesday, 6 October 2020 3:13 AM To: gfmoore/esci-dance-r esci-dance-r@noreply.github.com Cc: Geoff Cumming g.cumming@latrobe.edu.au; Author author@noreply.github.com Subject: Re: [gfmoore/esci-dance-r] First sketch ideas, for discussion (#1)

I've uploaded the starting attempt. It is basically correlation, but with the two panels.

Before I make any further progress would you check it out and make suggestions on these basics before I start dancing please.

I'm also watching the Giro d'Italia now!!! :) My decking is mostly done, but the rain has set in ;) so I'm slowly starting to get back to programming, must admit it's not easy to get going. :(

I'm not sure what you are wanting me to do with the nice picture above?

This program is completely new to me :)

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/gfmoore/esci-dance-r/issues/1#issuecomment-703734056, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ANFVLUGOOI7SX5A5G7UCXCLSJHWDBANCNFSM4QLKHD3Q.

gdcumming commented 4 years ago

I'll reply here just to the query about the picture above with the heavy red and blue lines.

The idea is that the light coloured outline points (3,000 of them in this pic) gives an impression of the bivariate normal population, a bit like all the 'fill' little circles under the population distribution in dances. (We'll use circles, not squares as in the pic.)

It should be possible to turn this population picture on or off, as a fixed background in the upper square display area.

A single enormous sampling from the bivariate normal population should generate this pic, which can then be stored and used as our single representation of the population.

Figure 11.15 (p. 310) in ITNS is another attempt at representing a bivariate normal population, this time with 5,000 smaller circles. Those circles probably too small and dark for us.

The trick will be to find a little circle of the best size and colour, so we get an impression of points piled up near the sloping main axis and shading away in density in all directions. It's a 2-D attempt to picture the classic 3-D 'English bobby's helmet' shape of the bivariate normal. Also need to choose the most effective number of little circles. A few thousand?

Ideally, the circle size we think best for the population pic will also serve as the size for the circles representing data points in a sample. Taking a sample would then look like picking a random N (a small number this time) of those background circles and highlighting them (in red in the pic above) as our latest sample.

Run a sequence of samples and the background would remain fixed, as various random sets of N points are highlighted. It could look like a dance of N points leaping about.

I'm guessing that the best data point size (same as circle size in population) is either as you have chosen for the current dance r prototype, or maybe a whisker larger.

gdcumming commented 4 years ago

0.0.6

At first I thought it was a bit faint, but I'm a convert, after playing with lots of values and sample sizes. It seems to me excellent.

My one concern is that in some cases there seem to be rather more points selected into a sample that seem to lie right at the fringe of the illustrated population, or beyond that. Of course, some of these should come up occasionally. Actually this may be part of an issue about the accuracy of the random sampling, which I'll open as a separate issue.

Closing...

gfmoore commented 4 years ago

I was reading this spec and was wondering how we could easily get a grey gradient for sample items. I'm thinking residuals, but not vertical, but perpendicular from the correlation line. However, this seems overly involved. Is there an easier fudge? Would about 4 or 5 levels be enough or a proper gradient. Of course we can implement it after the dance code, but just wondering.

gdcumming commented 4 years ago

Good brainstorming about H and V and perpendicular residuals to line. I don't understand the 'grey gradient for sample items' comment. What do you have in mind by grey and gradient?

In ITNS Fig 12.8 and the short 'X on Y' section (pp346-347) illustrates the Y and X residuals for regression of Y on X, and X on Y, respectively. (I'll send the pdf.)

In the text we refer to "the r = 1 line", which has slope Sy/Sx. (Ratio of the two sample SDs). The Y on X line rotates anticlockwise around from horizontal towards the r = 1 line, amount of rotation determined by r. Slope of that line is b(y.x) = r(Sy/Sx).

The X on Y line rotates clockwise from vertical towards the r = 1 line, amount of rotation also determined by r. Slope of that line is b(x.y) = (1/r)*(Sy/Sx).

The 'correlation line' displayed in your 'correlation' and 'dance r' has slope = SQRT(b(y.x)*b(x.y)) = Sy/Sx = slope of r = 1 line.

I've always wondered about the relation between that correlation line and the line that minimizes the sum of squares of perpendiculars from the data points. 'Perpendicular' is of course sensitive to the X and Y scales. So there may be some function of Sy/Sx in there somewhere. It would be super-neat if those two lines turned out to be the same, or closely related. Surely someone has investigated that? I have only v vague recollections of having seen something about it.

I searched for "linear regression that minimizes the sums of squares of perpendiculars to the line" and found a few discussions. (Isn't search wonderful?), including one from Wolfram: "Least Squares Fitting--Perpendicular Offsets". All the discussions I saw seemed to involve masses of algebra and didn't arrive at any very satisfactory or simple conclusion, tho' I didn't find anyone who had considered the possible relation with the r = 1 line.

Some discussions refer to 'orthogonal regression', but I still haven't found any that tie it back to r. Perhaps some extra algebra would do that, but it's not obvious to me.

Do you have in mind having some extra display options in the scatterplot? Being able to turn on Y residuals (vertical lines from points to the Y on X regression line) would probably be Priority 1, and would be neat to have :-). X residuals Priority 2, but also neat. I wouldn't be game to display perpendicular residuals (e.g. to the r = 1 line) without better understanding. I expect we're basically operating in a context (esp. for teaching) in which regression of Y on X is overwhelmingly the norm. Venturing out into regression of X on Y is pretty left-field, tho' personally I find it very useful to mention, to give a fuller picture. Hence its brief inclusion in ITNS, despite chapter reviewers' skepticism.

gdcumming commented 4 years ago

Regression chapter as promised. G

Geoff Cumming, DPhil, Emeritus Professor, School of Psychology and Public Health, La Trobe University, Melbourne Campus, Victoria, Australia 3086 Email: g.cumming@latrobe.edu.aumailto:g.cumming@latrobe.edu.au Intro textbook: Introduction to The New Statistics: Estimation, Open Science, and Beyond www.thenewstatistics.comhttp://www.thenewstatistics.com First book: Understanding The New Statistics: Effect Sizes, Confidence Intervals, and Meta-Analysis ww.thenewstatistics.comhttp://www.thenewstatistics.com/ Own page: http://www.latrobe.edu.au/she/contact-us/staff/profile?uname=GDCumming ESCI (Exploratory Software for Confidence Intervals): www.thenewstatistics.comhttp://www.thenewstatistics.com/ Introduction to the New Statistics is the first statistics textbook to focus on Open Science and the New Statistics. Instructors can obtain a free desk copy at https://www.routledge.com/resources/deskcopy. Order on Amazon http://www.amazon.com/dp/1138825522 [1464879986726_ITNS]

From: Gordon Moore notifications@github.com Sent: Monday, 12 October 2020 3:42 AM To: gfmoore/esci-dance-r esci-dance-r@noreply.github.com Cc: Geoff Cumming g.cumming@latrobe.edu.au; State change state_change@noreply.github.com Subject: Re: [gfmoore/esci-dance-r] First sketch ideas, for discussion (#1)

I was reading this spec and was wondering how we could easily get a grey gradient for sample items. I'm thinking residuals, but not vertical, but perpendicular from the correlation line. However, this seems overly involved. Is there an easier fudge? Would about 4 or 5 levels be enough or a proper gradient. Of course we can implement it after the dance code, but just wondering.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHubhttps://github.com/gfmoore/esci-dance-r/issues/1#issuecomment-706732490, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ANFVLUHKK76P6VLEJEI37JTSKHN6PANCNFSM4QLKHD3Q.

gfmoore commented 4 years ago

Sorry, hope I haven't muddied waters. I was referring to the "policeman's hemet" look for the background image. I need a way to determine how to make points far away from the "middle" to be lighter, less grey, in colour to give the 3D effect you talked about. So I was just thinking of the perpendicular distance of a sample point from the correlation line as a measure for that. The closer to the correlation line then the darker the grey, the further away the lighter the grey.

⁣Regards,

Gordon Moore gm@gordonmoore.co.uk

On 12 Oct 2020, 00:41, at 00:41, Geoff Cumming notifications@github.com wrote:

Good brainstorming about H and V and perpendicular residuals to line. I don't understand the 'grey gradient for sample items' comment. What do you have in mind by grey and gradient?

In ITNS Fig 12.8 and the short 'X on Y' section (pp346-347) illustrates the Y and X residuals for regression of Y on X, and X on Y, respectively. (I'll send the pdf.)

In the text we refer to "the r = 1 line", which has slope Sy/Sx. (Ratio of the two sample SDs). The Y on X line rotates anticlockwise around from horizontal towards the r = 1 line, amount of rotation determined by r. Slope of that line is b(y.x) = r(Sy/Sx).

The X on Y line rotates clockwise from vertical towards the r = 1 line, amount of rotation also determined by r. Slope of that line is b(x.y) = (1/r)*(Sy/Sx).

The 'correlation line' displayed in your 'correlation' and 'dance r' has slope = SQRT(b(y.x)*b(x.y)) = Sy/Sx = slope of r = 1 line.

I've always wondered about the relation between that correlation line and the line that minimizes the sum of squares of perpendiculars from the data points. 'Perpendicular' is of course sensitive to the X and Y scales. So there may be some function of Sy/Sx in there somewhere. It would be super-neat if those two lines turned out to be the same, or closely related. Surely someone has investigated that? I have only v vague recollections of having seen something about it.

I searched for "linear regression that minimizes the sums of squares of perpendiculars to the line" and found a few discussions. (Isn't search wonderful?), including one from Wolfram: "Least Squares Fitting--Perpendicular Offsets". All the discussions I saw seemed to involve masses of algebra and didn't arrive at any very satisfactory or simple conclusion, tho' I didn't find anyone who had considered the possible relation with the r = 1 line.

Some discussions refer to 'orthogonal regression', but I still haven't found any that tie it back to r. Perhaps some extra algebra would do that, but it's not obvious to me.

Do you have in mind having some extra display options in the scatterplot? Being able to turn on Y residuals (vertical lines from points to the Y on X regression line) would probably be Priority 1, and would be neat to have :-). X residuals Priority 2, but also neat. I wouldn't be game to display perpendicular residuals (e.g. to the r = 1 line) without better understanding. I expect we're basically operating in a context (esp. for teaching) in which regression of Y on X is overwhelmingly the norm. Venturing out into regression of X on Y is pretty left-field, tho' personally I find it very useful to mention, to give a fuller picture. Hence its brief inclusion in ITNS, despite chapter reviewers' skepticism.

-- You are receiving this because you modified the open/close state. Reply to this email directly or view it on GitHub: https://github.com/gfmoore/esci-dance-r/issues/1#issuecomment-706786732

gdcumming commented 4 years ago

0.0.11

Aha! Boy, did I go off on a wild goose chase! Sorry, ignore all that previous long comment.

No, I don't think we should use different greys to try for a 3-d effect. Simply having a huge sample of the faint, empty circles itself does an excellent job of illustrating where points are tightly packed and where they are sparse.

I'll close here and open a new issue for all things population.

gfmoore / esci-dance-r

First sketch ideas, for discussion #1