Open sunaynagoel opened 5 years ago
I was able to overcome this problem by looking in GitHub and installing the packages individually by using
install.packages('pander')
install.packages('dplyr')
install.packages('stargazer')
before calling the library.
library( pander ) # formatting tables
library( dplyr ) # data wrangling
library( stargazer ) # regression tables
I am sure if this is best way to do things. I have a feeling it is going take way more memory space and will slow down the execution because it will install the package everyone i run it. Please help.
That is correct. Any time you get that error it means the package needs to be installed. You only need to install it once, and note the you should NOT INCLUDE the install.packages() functions in your RMD code.
Packages are generally small in R and will not take up a lot of memory (there are a few exceptions). It is not unusual to have a couple hundred packages installed. They are not active until you call the library() function.
Thank you. So when I go to knit the file should I delete the portion of the code where I have installed the packages?
Question on problem # 6 as well
"Based upon the correlation structure reported below, which control variable do you expect would change the slope of caffeine if removed from the model? Which would result in a larger standard error associated with caffeine if removed from the model? Explain your reasoning."
I understand the concept pretty well but I am getting confused how to explain it because it is rather counterintuitive for me to explain what's happening when I am removing a control variable. I am not sure if I am getting it all wrong. for eg; introduction of gym time reduces the standard error and introduction of stress index increases the standard error. How do I use this information to answer the second part of question ?
So when I go to knit the file should I delete the portion of the code where I have installed the packages?
Correct.
Regarding explaining the impact of controls, I would use the equations for slope and SE:
slope = cov(x1,y) / var(x1)
SE(x1) = residual / ( n * var(x1) )
For the standard error to get smaller by adding a control variable you would have to reduce the residual because (1) control variables don't impact the sample size, and (2) control variables can only delete variance in the outcome Y and policy variable X1, not increase it. You can work the logic in reverse as well - if the standard error increases after removing a control variable it means you are increasing the residuals.
For slopes, any control that changes the X1 policy variable slope b1 must be correlated with X1 since that is the variance that generates the slope. So you can eliminate all control variables that are uncorrelated with X1.
Does that help?
I am still confused. I tried to answer the question you do want me to post my answer here or email you?
You can post it here
This is what I am thinking in my mind. It may be totally wrong though
"Removing control variable Stress Factor changes the slope and becomes steeper. Stress factor is associated with Caffeine intake or the policy variable. Anytime removing a control variable correlated to policy variable changes the slope and makes standard error smaller.
Removing control variable GymTime will make the standard error larger because this control variable is associated with outcome and introducing it reduces the residual and thereby decreasing standard error."
Hello all,
For the warm-up problem to draw a Ballentine Venn diagram, are we supposed to do this in RStudio? If so, is there a particular code to enter? I feel like I'm overlooking it somewhere.
Thanks! Courtney
@castower for warm up question, my guess is to practice to get ourself familiar with Ballentine Venn diagram. I just did it on a sheet of paper to practice and sure enough used it later in the lab for reference.
@castower for warm up question, my guess is to practice to get ourself familiar with Ballentine Venn diagram. I just did it on a sheet of paper to practice and sure enough used it later in the lab for reference.
Oh, okay! That makes sense, thanks :)
One more thing, is there not a question 5? I notice that the lab guide jumps from 4 to 6 and want to ensure that this is just a typo. Thanks!
Hello all,
I have one more question concerning the scatterplot matrix for the control variables.
After watching the review session video once again, I want to clarify that I'm reading it properly. For the boxes, are the labels on top (or below) the boxes the y-axes and the boxes to the left and right the x-axes or is it the opposite? For example, caffeine is the value that ranges up to 500, correct?
Thanks!
@castower I noticed question 5 missing as well.
In my understanding for x and y axis. X axis is horizontal reading and y axis is vertical reading. Correct me if I am wrong.
You are right - the lab numbering went from 4 to 6, there was no 5. I have updated it now, but it's fine if you don't have a 5. I can figure it out on your solutions.
The axis match the projection of the variable. The column variable (heart rate in this case) will be the X-axis, and the row variable (caffeine) will be the Y-axis.
Note that if you flip the axes, though, it doesn't change the strength or direction of the relationship, so it would not change your answer if you did confuse them.
Hello,
I am having trouble running the chunk needed for the regression table and graphs to answer questions in part II of the assignment.
This is the code that was already on the RMD template:
dat <- read.csv( "data/caffeine-heart-rate-w-controls.csv" )
mod <- lm( heart.rate ~ caffeine + stress.index + gym.time, data=dat)
stargazer( mod, header=F, type="html", omit.stat = c("adj.rsq", "f") )
This is the error that comes up:
Error in file(file, "rt") : cannot open the connection
Am I missing something?
Also, I realize I can use the table and graph on the full lab instructions but I'd still like to figure out what is causing the error on RStudio and don't want to have issues when knitting the document.
Thanks!
The data should be read from Github (it's currently reading it locally). Give me one minute to update the template.
Just replace:
dat <- read.csv( "data/caffeine-heart-rate-w-controls.csv" )
With:
URL <- "https://raw.githubusercontent.com/DS4PS/cpp-523-fall-2019/master/labs/data/caffeine-heart-rate-w-controls.csv"
dat <- read.csv( URL )
And you should be good to go.
Hmm. It still is not producing the chart. Can I remove the code to avoid problems on my knit file?
Disregard- It worked in my knit file. It just would not come up on RStudio. Thank you!
I am trying to load the package to start working with lab 03. It is giving me the following error. Any one else having the issue and how to fix this.
load packages
Error in library(pander) : there is no package called ‘pander’