DS4PS / cpp-523-fall-2019

Course shell for CPP 523 Foundations of Program Evaluation I for Fall 2019.
http://ds4ps.org/cpp-523-fall-2019/
6 stars 3 forks source link

Lab -06 #14

Open sunaynagoel opened 4 years ago

sunaynagoel commented 4 years ago

When I regress the variable using linear model

==========================================================
                                   Dependent variable:    
                               ---------------------------
                                       Happiness-y        
----------------------------------------------------------
Income-X                                0.0002***         
                                        (0.00000)         

Constant                                50.871***         
                                         (0.390)          

----------------------------------------------------------
Observations                              2,000           
Adjusted R2                               0.690           
==========================================================
Standard errors in parentheses *p<0.1; **p<0.05; ***p<0.01

But when I regress the model using Quadratic model, I Get

==========================================================
                                   Dependent variable:    
                               ---------------------------
                                       Happiness-y        
----------------------------------------------------------
Income-X                                0.001***          
                                        (0.0001)          

x_squared                                -0.000*          
                                         (0.000)          

Constant                                30.714***         
                                         (0.657)          

----------------------------------------------------------
Observations                               574            
Adjusted R2                               0.808           
==========================================================
Standard errors in parentheses *p<0.1; **p<0.05; ***p<0.01

I noticed b0 and b1 changed, with is understandable but observation sized decreased from 2000 to 574. Am I missing something ?

sunaynagoel commented 4 years ago

Please excuse bold format, I had no control on it.

just add fences  ``` around your code
sunaynagoel commented 4 years ago

Part 2. #1,2

The graph looks different on lab instruction and R studio. The outliers points A and B are at very different locations. Which one should we go by , lab instructions or R studio ?

lecy commented 4 years ago

@sunaynagoel Try re-loading the IncomeHappiness.csv data please. The file got corrupted when uploading to GitHub - it is fixed now.

URL <- "https://raw.githubusercontent.com/DS4PS/cpp-523-fall-2019/master/labs/data/IncomeHappiness.csv"
dat <- read.csv( URL )

The graph looks different on lab instruction and R studio. The outliers points A and B are at very different locations. Which one should we go by , lab instructions or R studio?

Here's what I get in R running the graph code. It's the same for me on both. Not sure what you are seeing?

image

Can you copy and paste the image into your question?

library( dplyr )

URL <- "https://github.com/DS4PS/cpp-523-fall-2019/blob/master/labs/data/np-comp-data.rds?raw=true"
dat <- readRDS(gzcon(url( URL )))
set.seed( 1234 )
d2 <- sample_n( dat, 2000 )

plot( log(d2$REVENUE), log(d2$SALARY), bty="n", pch=19, col="darkorange",
      xlab="Nonprofit Revenue (logged)", ylab="Executive Director Salary (logged)",
      xlim=c(5,25), ylim=c(5,16))
abline( h=seq( 1, 20, 0.5 ), col=gray(0.5,0.2), lwd=1 )
abline( v=seq( 1, 25, 0.5 ), col=gray(0.5,0.2), lwd=1 )
abline( lm( log(d2$SALARY) ~ log(d2$REVENUE) ), col=gray(0.5,0.5), lwd=3 )
sunaynagoel commented 4 years ago

@sunaynagoel Try re-loading the IncomeHappiness.csv data please. The file got corrupted when uploading to GitHub - it is fixed now.

It worked. so the number of observation show 2000 in both linear and quadratic regression. But the coefficient for B2 is -0.000, that takes away marginal effect of quadratic regression. Is that right ?

sunaynagoel commented 4 years ago

Can you copy and paste the image into your question?

Screen Shot 2019-09-26 at 8 34 22 PM

Thats what mine looks like. but in in the instruction A is at top left extreme of X and B is sort of in the middle below the slope line. I haven't changed anything.

lecy commented 4 years ago

It's a conceptual question, you don't need to do any math. So just use the image in the lab.

But I suspect you didn't include the filter when you loaded the data?

URL <- "https://github.com/DS4PS/cpp-523-fall-2019/blob/master/labs/data/np-comp-data.rds?raw=true"
dat <- readRDS(gzcon(url( URL )))
set.seed( 1234 )
d2 <- sample_n( dat, 2000 )

It's a simplified dataset (sample of 2000 cases). It looks like you are plotting the full dataset there.

lecy commented 4 years ago

b2 is still significant, it's just a small number since x-squared is so big:

y <- dat$happiness
x <- dat$income 
x2 <- x*x
options( scipen=8 )
summary( lm( y ~ x + x2 ) )
                     Estimate        Std. Error t value Pr(>|t|)    
(Intercept) 35.34826872249551  0.36139931707627   97.81   <2e-16 ***
x            0.00073610233669  0.00000887024309   82.99   <2e-16 ***
x2          -0.00000000251607  0.00000000004385  -57.38   <2e-16 ***

You can simplify the print-out by changing the scale of X. A linear transformation (dividing by a constant) does not impact the model in any meaningful way other than changing the scale of measures:

y <- dat$happiness
x <- dat$income / 10000
x2 <- x*x
summary( lm( y ~ x + x2 ) )
Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) 35.348269   0.361399   97.81   <2e-16 ***
x            7.361023   0.088702   82.99   <2e-16 ***
x2          -0.251607   0.004385  -57.38   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

You just need to shift interpretation from b1 representing a $1 change in X to a $10,000 change in X.

sunaynagoel commented 4 years ago

Makes perfect sense. Thanks again

sunaynagoel commented 4 years ago

1

when I produce regression table, it generates in R studio with this error . But when knit the document the table does not show up.

length of NULL cannot be changedlength of NULL cannot be changedlength of NULL cannot be changedlength of NULL cannot be changedlength of NULL cannot be changed

jmacost5 commented 4 years ago

This is a dumb question, but is question 1 asking us to define each of the variables in the equation? Along with this would b0 be observation or would it be the constant?

lecy commented 4 years ago

@sunaynagoel I would need more information. What question, and what is your code?

lecy commented 4 years ago

@jmacost5 Yes, run the regression and report the table in Q1, then use the coefficients for Q2-Q4.

b0 is the constant (intercept).

sunaynagoel commented 4 years ago

@sunaynagoel I would need more information. What question, and what is your code? It is or question #1 Here is my code. It is giving me the output I need in R studio. But when i Knit the table does not show up on my html document.

y <-  dat$happiness
x <- dat$income/10000
x_squared  <- x*x
m <- lm( y ~ x + x_squared, data=dat )
stargazer( m, type="text",
dep.var.labels=c("Happiness-y"),
covariate.labels=c("Income-X"),
omit.stat = c("rsq","f","ser"),
notes.label = "Standard errors in parentheses" )
lecy commented 4 years ago

Check the code chunk options. They should be:

{r, results="asis"}
stargazer( m, type="html", ... )
sunaynagoel commented 4 years ago

@lecy The link to submit the assignment is not working for me :)

lecy commented 4 years ago

@sunaynagoel the link should be fixed!

castower commented 4 years ago

Hello all, I am trying to produce the graph presented at the top of the lab in my RMD file:

plot( dat$income, dat$happiness, 
      xlab="Income (Thousands of Dollars)", ylab="Hapiness Scale",
      main="Does Money Make You Happy?",
      pch=19, col="darkorange", bty="n",
      xaxt="n" )
axis( side=1, at=c(0,50000,100000,150000,200000), labels=c("$0","$50k","$100k","$150k","$200k") )
lines( 1:200000, y_hat, col=gray(0.3,0.5), lwd=6 )

However, I keep getting an error:

Error in xy.coords(x, y) : object 'y_hat' not found

Did anyone else have this issue?

castower commented 4 years ago

Also, I'm having issues with my second code chunk:

m <- lm( y ~ x, data=dat )
stargazer( m, type="html",
           omit.stat = c("rsq","f","ser"),
           notes.label = "Standard errors in parentheses" )

I keep getting this error:

Error in eval(predvars, data, env) : object 'y' not found

I'm not quite sure what this means. I am assuming the y is supposed to come from the data set? Should I change y to y.happy?

lecy commented 4 years ago

The y-hat vector represents the predicted values of Y. There are some shortcut functions to get it, but the best way to get comfortable with the models is practicing with the regression formula after you have run your regression:

b0 <- 10
b1 <- 2
b2 <- 1 
x1 <- dat$x1
x2 <- dat$x2
y.hat <- b0 + b1*x1 + b2*x2
plot( x1, y )
lines( x1, y.hat, type="l", lwd=2, col="darkorange" )
castower commented 4 years ago

@lecy thank you!

I think I've also gotten the hang of the table:

x <- dat$income/10000
xsquared <- x*x
y <- dat$happiness
m <- lm( y ~ x + xsquared, data=dat )
stargazer( m, type="text",
           omit.stat = c("rsq","f","ser"),
           notes.label = "Standard errors in parentheses" )
==========================================================
                                   Dependent variable:    
                               ---------------------------
                                            y             
----------------------------------------------------------
x                                       7.361***          
                                         (0.089)          

xsquared                                -0.252***         
                                         (0.004)          

Constant                                35.348***         
                                         (0.361)          

----------------------------------------------------------
Observations                              2,000           
Adjusted R2                               0.883           
==========================================================
Standard errors in parentheses *p<0.1; **p<0.05; ***p<0.01
lecy commented 4 years ago

Also, I'm having issues with my second code chunk:

m <- lm( y ~ x, data=dat )
stargazer( m, type="html",
           omit.stat = c("rsq","f","ser"),
           notes.label = "Standard errors in parentheses" )

Correct, the Y is just the Y from your current model, which is dat$happy in the dataset or y.happy in the example code.

I had to suppress some of the code otherwise you would have had all of the answers!

lecy commented 4 years ago

@castower If you want to use the pretty version for knitted filed, stargazer( ..., type="html") you need to include the code chunk argument:

{r, results="asis"}
stargazer( ..., type="html")

Otherwise R prints the HTML code with comments in front of them, so they are just read as regular text and not interpreted as an HTML table by the RMD document.

JaesaR commented 4 years ago

I am having trouble creating the regression line in the first plot. I am using the following code:

y <- dat$happiness
x <- dat$income/10000
x2 <- x^2

summary( lm( y ~ x + x2 ) )

b0 <- 35.34826872
b1 <-  0.73610234
b2 <- -0.00251607

y_hat <- b0 + b1*x +b2*x2

plot( dat$income, dat$happiness, 
      xlab="Income (Thousands of Dollars)", ylab="Hapiness Scale",
      main="Does Money Make You Happy?",
      pch=19, col="darkorange", bty="n",
      xaxt="n" )
axis( side=1, at=c(0,50000,100000,150000,200000), labels=c("$0","$50k","$100k","$150k","$200k") )
lines( 1:200000, y_hat, col=gray(0.3,0.5), lwd=6 )

Everything runs perfectly until the last line, which returns this: Error in xy.coords(x, y) : 'x' and 'y' lengths differ.

How do I fix this?

castower commented 4 years ago

Has anyone else's table in question 4 produced different numbers than the written formula? I'm getting a constant of 6.193 vs 6.367

cjbecerr commented 4 years ago

@castower This is happening to me as well. I used what Dr. Lecy had in his provided equation because for some reason I think our datasets are being loaded differently. The graph is also being different for me, but he mentions in comments above to just use the graph from the lab since these are more conceptual questions. @lecy Should I exclude the graphs for part 2 since it is loading differently than the one you presented?

lecy commented 4 years ago

@jaesaR The problem is the transformation here:

x <- dat$income/10000

If you remove the 10,000 (and change your regression coefficients) then the plot will work. Or otherwise, use this instead:

plot( x, y, 
      xlab="Income (Thousands of Dollars)", ylab="Hapiness Scale",
      main="Does Money Make You Happy?",
      pch=19, col="darkorange", bty="n",
      xaxt="n" )
axis( side=1, at=c(0,5,10,15,20), labels=c("$0","$50k","$100k","$150k","$200k") )
lines( x, y_hat, col=gray(0.3,0.5), lwd=6 )
lecy commented 4 years ago

@castower @cjbecerr Are you using the subset d2?

library( dplyr )
library( stargazer )

URL <- "https://github.com/DS4PS/cpp-523-fall-2019/blob/master/labs/data/np-comp-data.rds?raw=true"
dat <- readRDS(gzcon(url( URL )))
set.seed( 1234 )
d2 <- sample_n( dat, 2000 )

m <- lm( log(SALARY) ~ log(REVENUE), data=d2 )
stargazer( m, type="text",
           omit.stat = c("rsq","f","ser"),
           notes.label = "Standard errors in parentheses" )
==========================================================
                                   Dependent variable:    
                               ---------------------------
                                       log(SALARY)        
----------------------------------------------------------
log(REVENUE)                            0.343***          
                                         (0.008)          

Constant                                6.367***          
                                         (0.121)          

----------------------------------------------------------
Observations                              2,000           
Adjusted R2                               0.460           
==========================================================
Standard errors in parentheses *p<0.1; **p<0.05; ***p<0.01

image

Either way, it is a conceptual question to show how to translate from logged models back into real numbers. You don't have to report a regression table for the solutions. Please use the numbers in the lab.

lecy commented 4 years ago

@castower @cjbecerr Out of curiosity, what do you get when you type this:

set.seed( 1234 )
rnorm(5)
[1] -1.2070657  0.2774292  1.0844412 -2.3456977  0.4291247
castower commented 4 years ago

@castower @cjbecerr Out of curiosity, what do you get when you type this:

set.seed( 1234 )
rnorm(5)
[1] -1.2070657  0.2774292  1.0844412 -2.3456977  0.4291247

I get the following:

[1] -1.2070657  0.2774292  1.0844412 -2.3456977  0.4291247
lecy commented 4 years ago

@castower all is right in the world ;-)