avehtari / ROS-Examples

Regression and other stories R examples
https://avehtari.github.io/ROS-Examples/
323 stars 257 forks source link

Code for Causal.R lines 104-106 #117

Closed ericNeufeld closed 11 months ago

ericNeufeld commented 2 years ago

The above lines show:

104 xx <- rnorm(N, 0, 1)^2 105 z <- rep(0:1, N/2) 106 xx <- ifelse(z==0, rnorm(N, 0, 1.2)^2, rnorm(N, 0, .8)^2)

Why is xx initialized to one value, then 2 lines later reset? Why is rnorm() squared?

ericNeufeld commented 2 years ago

I'm guessing the squaring is to make the values positive. Still don't understand the reset of xx. The numbers I get are close to what appears in ROS

avehtari commented 2 years ago

Thanks for reporting this! I removed that extra line which is not needed

ericNeufeld commented 2 years ago

Thanks, we are just learning this material.

I have found that the distributions were squared by way of changing the average of the pre-test. Would you know why this was done, instead of changing the parameter for mean in norm()?

Eric Neufeld Department of Computer Science @.**@.>

On Jul 24, 2022, at 8:31 AM, Aki Vehtari @.**@.>> wrote:

CAUTION: External to USask. Verify sender and use caution with links and attachments. Forward suspicious emails to @.**@.>

Thanks for reporting this! I removed that extra line which is not needed

— Reply to this email directly, view it on GitHubhttps://github.com/avehtari/ROS-Examples/issues/117#issuecomment-1193329846, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAHHIMBTQABB5BHLLDWWMNDVVVHVBANCNFSM54M2YBDA. You are receiving this because you authored the thread.Message ID: @.***>

avehtari commented 2 years ago

Hi, can you provide a bit more context to your question?

ericNeufeld commented 2 years ago

Hi Aki Thanks for getting back to me. I am relatively new to Gelman’s style of work, so I may be over-explaining something that is obvious to an expert. In “Regression and Other Stories” (page 11-12) he described a scenario where units in control and treatment groups in an experiment had a pre-treatment predictor. To give you an idea how new this work is to me, I couldn’t find a simple definition of this term, but I understand such a variable could be a simple pre-test.

(From the text **) The treated units were 4.8 units higher than the controls. (The numbers in the text are 31.7 and 25.5; I got slightly different numbers but attribute that to a slightly different set of random numbers.) The text goes on to say that the control and treatment groups also (by chance) differed in their pre-treatment indicator xx, getting a mean of 0.4 for treated units and 1.2 for the controls ( I got values of 0.55 and 1.47 respectively); adjusting for this yields a treatment effect of 10.0.

I took the data generated by the R code and plunked it into Excel

If I regress yy against just z. I get

yy = 5.1166*z + 27.52

which makes me think that the coefficient of z here (5.1) is very close to the 4.8 reported in the text, and that first part of the above quote (**) says simply regressing yy against the treatment isn’t a good idea because the pre-tests are not being considered. If I regress xx against z, I get

xx = -0.92 * z + 1.47

but if I substitute back into the first equation, I get

yy = 9.8 z + 5.07 (-0.92 z + 1.47) + 20.05 = 5.14 z + 27.5

which still gives me a small effect size for z. This suggests to me there is an interaction.

But if I regress yy against xx and z, I get

yy = 9.8z + 5.07 xx + 20.05,

which gives a value for z close to what the text calls the “adjusted value”.

Doing separate regressions for z=0 and z=1, i get

yy[z=0] = 5.06 xx + 20.07 yy[z=1] = 5.17 xx + 29.79

which defines lines very similar to the lines in Figure 1.8 is generated by the code (lines 138-9 in SimpleCausal.Rmd) abline(coef(lm_2)[1], coef(lm_2)[2]) —> abline( 19.40, 5.13) abline(coef(lm_2)[1] + coef(lm_2)[3], coef(lm_2)[2]) —>abline( 19.40 + 9.92, 5.13) (the coefficients produced by the R code)

which is roughly saying the causal effect is equal to the difference of the y-intercepts. But the slopes are slightly different, that could be random error.

So, I guess my question is this: the code seems to estimate the causal effect of z as the difference of the y-intercepts, whereas I get it by regressing against both variables which might interact. So is the code in Causal.R ’shorthand’ for a more complex process?

Eric Neufeld Department of Computer Science @.**@.>

, so I could compare it with something I’d worked with. For the formula yyy ~ xx + z, data=data (all data), excel gives me the line

On Aug 2, 2022, at 7:38 AM, Aki Vehtari @.**@.>> wrote:

CAUTION: External to USask. Verify sender and use caution with links and attachments. Forward suspicious emails to @.**@.>

Hi, can you provide a bit more context to your question?

— Reply to this email directly, view it on GitHubhttps://github.com/avehtari/ROS-Examples/issues/117#issuecomment-1202585465, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAHHIMGD4J6BNTL2X2E2RALVXEQGRANCNFSM54M2YBDA. You are receiving this because you authored the thread.Message ID: @.***>

andrewgelman commented 1 year ago

I just squared it there because I wanted to create a simple example where x does not have any simple distribution. I don't remember exactly why, but I wanted an example where most of the data were low and there were just a few higher values, and this was a way to do it. Really the only point was to create that graph---which, again, I wanted to look different from the generic normal distributions that are usually shown in such plots. The underlying code didn't really matter so I didn't try to make that code clean.

ericNeufeld commented 1 year ago

Thanks for replying, I understand now. My RA and I have been going through the book it line by line.

Eric Neufeld Department of Computer Science @.**@.>

On Nov 6, 2022, at 12:06 PM, Andrew Gelman @.**@.>> wrote:

CAUTION: External to USask. Verify sender and use caution with links and attachments. Forward suspicious emails to @.**@.>

I just squared it there because I wanted to create a simple example where x does not have any simple distribution. I don't remember exactly why, but I wanted an example where most of the data were low and there were just a few higher values, and this was a way to do it. Really the only point was to create that graph---which, again, I wanted to look different from the generic normal distributions that are usually shown in such plots. The underlying code didn't really matter so I didn't try to make that code clean.

— Reply to this email directly, view it on GitHubhttps://github.com/avehtari/ROS-Examples/issues/117#issuecomment-1304871192, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAHHIMHF2L7UIVS4YVTCVC3WG76RXANCNFSM54M2YBDA. You are receiving this because you authored the thread.Message ID: @.***>