DS4PS / cpp-523-fall-2019

Course shell for CPP 523 Foundations of Program Evaluation I for Fall 2019.
http://ds4ps.org/cpp-523-fall-2019/
6 stars 3 forks source link

Lab 03 #11

Open sunaynagoel opened 5 years ago

sunaynagoel commented 5 years ago

I am trying to load the package to start working with lab 03. It is giving me the following error. Any one else having the issue and how to fix this.

load packages

library( pander )     # formatting tables
library( dplyr )      # data wrangling
library( stargazer )  # regression tables

Error in library(pander) : there is no package called ‘pander’

sunaynagoel commented 5 years ago

I was able to overcome this problem by looking in GitHub and installing the packages individually by using

install.packages('pander')
install.packages('dplyr')
install.packages('stargazer')

before calling the library.

library( pander )     # formatting tables
library( dplyr )      # data wrangling
library( stargazer )  # regression tables

I am sure if this is best way to do things. I have a feeling it is going take way more memory space and will slow down the execution because it will install the package everyone i run it. Please help.

lecy commented 5 years ago

That is correct. Any time you get that error it means the package needs to be installed. You only need to install it once, and note the you should NOT INCLUDE the install.packages() functions in your RMD code.

Packages are generally small in R and will not take up a lot of memory (there are a few exceptions). It is not unusual to have a couple hundred packages installed. They are not active until you call the library() function.

sunaynagoel commented 5 years ago

Thank you. So when I go to knit the file should I delete the portion of the code where I have installed the packages?

Question on problem # 6 as well

"Based upon the correlation structure reported below, which control variable do you expect would change the slope of caffeine if removed from the model? Which would result in a larger standard error associated with caffeine if removed from the model? Explain your reasoning."

I understand the concept pretty well but I am getting confused how to explain it because it is rather counterintuitive for me to explain what's happening when I am removing a control variable. I am not sure if I am getting it all wrong. for eg; introduction of gym time reduces the standard error and introduction of stress index increases the standard error. How do I use this information to answer the second part of question ?

lecy commented 5 years ago

So when I go to knit the file should I delete the portion of the code where I have installed the packages?

Correct.

Regarding explaining the impact of controls, I would use the equations for slope and SE:

slope = cov(x1,y) / var(x1)
SE(x1) = residual / ( n * var(x1) )

For the standard error to get smaller by adding a control variable you would have to reduce the residual because (1) control variables don't impact the sample size, and (2) control variables can only delete variance in the outcome Y and policy variable X1, not increase it. You can work the logic in reverse as well - if the standard error increases after removing a control variable it means you are increasing the residuals.

For slopes, any control that changes the X1 policy variable slope b1 must be correlated with X1 since that is the variance that generates the slope. So you can eliminate all control variables that are uncorrelated with X1.

Does that help?

sunaynagoel commented 5 years ago

I am still confused. I tried to answer the question you do want me to post my answer here or email you?

lecy commented 5 years ago

You can post it here

sunaynagoel commented 5 years ago

This is what I am thinking in my mind. It may be totally wrong though "Removing control variable Stress Factor changes the slope and becomes steeper. Stress factor is associated with Caffeine intake or the policy variable. Anytime removing a control variable correlated to policy variable changes the slope and makes standard error smaller.
Removing control variable GymTime will make the standard error larger because this control variable is associated with outcome and introducing it reduces the residual and thereby decreasing standard error."

castower commented 5 years ago

Hello all,

For the warm-up problem to draw a Ballentine Venn diagram, are we supposed to do this in RStudio? If so, is there a particular code to enter? I feel like I'm overlooking it somewhere.

Thanks! Courtney

sunaynagoel commented 5 years ago

@castower for warm up question, my guess is to practice to get ourself familiar with Ballentine Venn diagram. I just did it on a sheet of paper to practice and sure enough used it later in the lab for reference.

castower commented 5 years ago

@castower for warm up question, my guess is to practice to get ourself familiar with Ballentine Venn diagram. I just did it on a sheet of paper to practice and sure enough used it later in the lab for reference.

Oh, okay! That makes sense, thanks :)

castower commented 5 years ago

One more thing, is there not a question 5? I notice that the lab guide jumps from 4 to 6 and want to ensure that this is just a typo. Thanks!

castower commented 5 years ago

Hello all,

I have one more question concerning the scatterplot matrix for the control variables.

After watching the review session video once again, I want to clarify that I'm reading it properly. For the boxes, are the labels on top (or below) the boxes the y-axes and the boxes to the left and right the x-axes or is it the opposite? For example, caffeine is the value that ranges up to 500, correct?

Thanks!

sunaynagoel commented 5 years ago

@castower I noticed question 5 missing as well.

In my understanding for x and y axis. X axis is horizontal reading and y axis is vertical reading. Correct me if I am wrong.

lecy commented 5 years ago

You are right - the lab numbering went from 4 to 6, there was no 5. I have updated it now, but it's fine if you don't have a 5. I can figure it out on your solutions.

lecy commented 5 years ago

The axis match the projection of the variable. The column variable (heart rate in this case) will be the X-axis, and the row variable (caffeine) will be the Y-axis.

image

Note that if you flip the axes, though, it doesn't change the strength or direction of the relationship, so it would not change your answer if you did confuse them.

image

image

Jigarci3 commented 5 years ago

Hello,

I am having trouble running the chunk needed for the regression table and graphs to answer questions in part II of the assignment.

This is the code that was already on the RMD template:

dat <- read.csv( "data/caffeine-heart-rate-w-controls.csv" )
mod <- lm( heart.rate ~ caffeine + stress.index + gym.time, data=dat)
stargazer( mod, header=F, type="html", omit.stat = c("adj.rsq", "f") )

This is the error that comes up:

Error in file(file, "rt") : cannot open the connection

Am I missing something?

Jigarci3 commented 5 years ago

Also, I realize I can use the table and graph on the full lab instructions but I'd still like to figure out what is causing the error on RStudio and don't want to have issues when knitting the document.

Thanks!

lecy commented 5 years ago

The data should be read from Github (it's currently reading it locally). Give me one minute to update the template.

lecy commented 5 years ago

Just replace:

dat <- read.csv( "data/caffeine-heart-rate-w-controls.csv" )

With:

URL <- "https://raw.githubusercontent.com/DS4PS/cpp-523-fall-2019/master/labs/data/caffeine-heart-rate-w-controls.csv"
dat <- read.csv( URL )

And you should be good to go.

Jigarci3 commented 5 years ago

Hmm. It still is not producing the chart. Can I remove the code to avoid problems on my knit file?

Jigarci3 commented 5 years ago

Disregard- It worked in my knit file. It just would not come up on RStudio. Thank you!