LAB-01 - Githubissues

sunaynagoel commented 4 years ago

Hello, In question 3a, does "short term effect" mean immediate effect after the treatment? Here is the actual question.

Q3a: Has the new schedule increased or decreased the use of public transportation in the short term? Indicate the magnitude of the effect and whether it is statistically significant.

~Nina

sunaynagoel commented 4 years ago

-Q4a: What is the number of passengers 100 days after the intervention?

When I set up my equation, I get an answer 1424. This is different than 1314, which is in the data table for 100 days after the intervention or at day 220. Not sure what am I doing wrong? Here is my equation. Number of Passengers = b0 + b1∗220 + b2∗1 + b3∗100 + e
where,
b0=1327.89,
b1=0.11,
b2=21.68,
and b3=0.50
Number of passengers = 1327.89+24.2+21.68+50=1423.77 or 1424

Thanks ~Nina

lecy commented 4 years ago

For 3a that is correct. The instantaneous effect would tell you how things have changed directly after the intervention. The sustained change is a little harder to interpret, because the effect size is dependent on the time frame.

These are stocks and flows. A small pay raise has a bigger long-term impact than a large one-time bonus!

To answer the question "in the short-term" you would want to use the immediate or instantaneous effects.

lecy commented 4 years ago

For Q4a, will the predicted value be the same as the actual observed value?

What do we call the difference between the predicted and observed?

Does your model report the residual standard error? If so, that can be used to gauge a reasonable response.

The standard error tells us "on average how far should our sample statistic be from the true population statistic" and is used to test for sizes of coefficients.

The residual standard error tells us, on average how far will each data point be from the regression line? So how close is each predicted point (the regression line) to the actual data points.

sunaynagoel commented 4 years ago

For Q4a, will the predicted value be the same as the actual observed value?

What do we call the difference between the predicted and observed?

Does your model report the residual standard error? If so, that can be used to gauge a reasonable response.

The standard error tells us "on average how far should our sample statistic be from the true population statistic" and is used to test for sizes of coefficients.

The residual standard error tells us, on average how far will each data point be from the regression line? So how close is each predicted point (the regression line) to the actual data points.

Thank you. I completely overlooked RSS. adjusting for RSS perfectly explains my results. Thank you.

lepp12 commented 4 years ago

I'm having a tough time in the installation process for the Wats package and more specifically with the devtools::install_github( repo="OuhscBbmc/Wats" ). I'm getting the error Error: package ‘pkgload’ does not have a namespace

I haven't been able to find any resources that help troubleshoot this. Here is my session info:

- Session info -----------------------------------------------------------------------------------------------------------------------
 setting  value                       
 version  R version 3.6.1 (2019-07-05)
 os       Windows 10 x64              
 system   x86_64, mingw32             
 ui       RStudio                     
 language (EN)                        
 collate  English_United States.1252  
 ctype    English_United States.1252  
 tz       America/Phoenix             
 date     2020-03-20                  

- Packages ---------------------------------------------------------------------------------------------------------------------------
 package     * version date       lib source        
 assertthat    0.2.1   2019-03-21 [1] CRAN (R 3.6.1)
 callr         3.4.2   2020-02-12 [1] CRAN (R 3.6.3)
 cli           2.0.2   2020-02-28 [1] CRAN (R 3.6.3)
 colorspace    1.4-1   2019-03-18 [1] CRAN (R 3.6.1)
 crayon        1.3.4   2017-09-16 [1] CRAN (R 3.6.1)
 digest        0.6.20  2019-07-04 [1] CRAN (R 3.6.1)
 ellipsis      0.3.0   2019-09-20 [1] CRAN (R 3.6.1)
 fansi         0.4.1   2020-01-08 [1] CRAN (R 3.6.3)
 glue          1.3.1   2019-03-12 [1] CRAN (R 3.6.1)
 knitr         1.24    2019-08-08 [1] CRAN (R 3.6.1)
 lattice       0.20-38 2018-11-04 [2] CRAN (R 3.6.1)
 lifecycle     0.2.0   2020-03-06 [1] CRAN (R 3.6.3)
 memoise       1.1.0   2017-04-21 [1] CRAN (R 3.6.1)
 munsell       0.5.0   2018-06-12 [1] CRAN (R 3.6.1)
 pander      * 0.6.3   2018-11-06 [1] CRAN (R 3.6.3)
 pkgbuild      1.0.6   2019-10-09 [1] CRAN (R 3.6.1)
 prettyunits   1.1.1   2020-01-24 [1] CRAN (R 3.6.3)
 processx      3.4.1   2019-07-18 [1] CRAN (R 3.6.1)
 ps            1.3.0   2018-12-21 [1] CRAN (R 3.6.1)
 R6            2.4.1   2019-11-12 [1] CRAN (R 3.6.3)
 Rcpp          1.0.3   2019-11-08 [1] CRAN (R 3.6.3)
 rlang         0.4.5   2020-03-01 [1] CRAN (R 3.6.3)
 rstudioapi    0.11    2020-02-07 [1] CRAN (R 3.6.3)
 scales      * 1.1.0   2019-11-18 [1] CRAN (R 3.6.3)
 sessioninfo   1.1.1   2018-11-05 [1] CRAN (R 3.6.1)
 sp          * 1.3-1   2018-06-05 [1] CRAN (R 3.6.1)
 stargazer   * 5.2.2   2018-05-30 [1] CRAN (R 3.6.0)
 withr         2.1.2   2018-03-15 [1] CRAN (R 3.6.1)
 xfun          0.9     2019-08-21 [1] CRAN (R 3.6.1)

[1] C:/Users/lepp1/Documents/R/win-library/3.6
[2] C:/Program Files/R/R-3.6.1/library

lecy commented 4 years ago

@lepp12 The repo for pkgload is here:

https://github.com/r-lib/pkgload

You might try reinstalling devtools and pkgload first?

lepp12 commented 4 years ago

@lecy reinstalling devtools did the trick, thank you!

castower commented 4 years ago

Hello all,

I have a question concerning the 'timeSince' variable. In the chart presented on the lab, it has day 360 as 240 days since the start of treatment, but I'm curious if this should be 239 days.

My reasoning for this is that 240 days counts day 121 as having a 1 for time since treatment, but shouldn't this be 0 since it is the day the treatment starts and there has not been any days passed yet? Maybe I'm overthinking, but the following is my chart:

	passengers	time	treatment	timeSince
1	1328	1	0	0
2	1407	2	0	0
3	1425	3	0	0
4	1252	4	0	0
5	1287	5	0	0
6	1353	6	0	0
120	1288	120	0	0
121	1348	121	1	0
360	1463	360	1	239
361	1391	361	1	240

@lecy

lecy commented 4 years ago

@castower That's a good question. It goes back to the topic of appropriate time period for your study from 524.

The real danger here is if there was a big change that occurred right at the point when treatment starts, but we include a point from before the treatment and thus we essentially introduce an outlier into the post-treatment regression line. It could impact (bias) the two treatment coefficients.

In this case the treatment is a price change, which would be a discrete process. As a result, I think it is defensible to include the first day as part of the treatment.

In reality, though, it might take a few weeks or months for people to adjust behavior as a result of the new prices. There is a catalog of specifications that you can test to ensure you are capturing the true effect. We will cover these more during the regression discontinuity chapter.

cite

There is a series of specification tests that can be run. One approach would be to drop the entire week of the intervention (three days before, three days after, day of). Then re-run the models, see if the coefficients change (which would be a result of the influence of outliers at the kink point). Drop a full month and do the same. There are similarly some non-parametric models that can be used to check for robustness.

In this case, there is a case to include the first day of treatment since it is a price change. But you might try graphing the model near the cut-off while imposing the regression lines. See if you are including an outlier on either side. Run it both ways and see if results change.

Good question, though.

DS4PS / cpp-525-spr-2020

LAB-01 #2