JMSLab / eventstudyr

Other
24 stars 1 forks source link

Pull Request for #9: Implement smoothest path #18

Closed santiagohermo closed 1 year ago

santiagohermo commented 1 year ago

This is a PR for #9, where we added code to implement the smoothest path. For now the solver doesn't work, so a placeholder path is returned instead (the one obtained when finding the minimum order).

Some things to do here:

nateschor commented 1 year ago

@santiagohermo if you haven't already been doing this, would you be able to run devtools::test() (Ctrl + Shift + T on Windows) before you push? The workflow here might be helpful. I just checked and am seeing a bunch of tests that are now failing, and I've at least found that it's easier to tackle them early as they come up rather than trying to address them all at once when they build up.

It's probably easier for me to address any of the tests that fail as you work on AddSmPath and make general improvements, so just let me know when things fail and I'll take a look!

Thanks!

image

santiagohermo commented 1 year ago

Thanks for flagging this @nateschor. I actually hadn't done any testing. I'll make sure to make sure they work in future commits. And happy to solve the ones that arose here as well as soon as I have some time!

nateschor commented 1 year ago

Per call: @santiagohermo will finish getting smoothest path to work, and then we will go back and bring the unit tests up to date

Whenever one of us has bandwidth, we'll play around with/test the ybreaks automatic tick generation to give more breaks (currently, for certain inputs it only gives 3 breaks and does not look great)

santiagohermo commented 1 year ago

Here is a new update, very close to the finish line! I did the following:

A couple of comments/questions below.


On the implementation of the smoothest path @SimonFreyaldenhoven. First of all, thanks for your help! Second, here are some comments:

I think we reached a stable implementation (up to the pending review). If we want to improve following up on these comments, I'd suggest a new issue.

FYI @jmshapir


On the code @nateschor:

I'm going to go over the code again, and I might leave some further comments/questions on the code. Thanks!

SimonFreyaldenhoven commented 1 year ago

Thanks @santiagohermo!

My understanding of the implementation of the smoothest line is that 1) it works correctly in basic examples, b) when it doesn't work, rather than produce a misleading path it generally fails to converge and spits out the corresponding message. Does that sound right to you?

If so, I'm happy to approve. Before we publish the package, perhaps we can just create a bunch of examples with different shapes and try to run them in R (and perhaps compare them to the Stata output) to get a better sense of the robustness of the implementation and whether we need to do more work.

So I would suggest to open a follow up issue once we're done here where we just make up a variety of event time paths + ses, and plotting if the results consistently make sense and look comparable in Stata and R.

EDIT: One other question @santiagohermo - did you look at the Stata implementation and got the sense that what you've done here is comparable?

nateschor commented 1 year ago
  • All tests were duplicated for OLS and FHS estimators. However, I thought this duplication was unnecessary because we were testing functionalities of EventStudyPlot, and so it was secondary how the coefficients and covariance matrix was generated. As a result, I decided to drop these duplicated tests. Maybe I missed something, so let me know if you disagree!

@santiagohermo our thinking with duplicating the tests was that in order to address the bullet point comments about still plotting the instrument in https://github.com/JMSLab/eventstudyr/issues/11#issuecomment-1240070359, we have to construct the Event Study Plot for FHS slightly differently than OLS. We wanted to make sure that those changes make it through to the Event Study Plot by having those tests for both the OLS and FHS cases.

Does that change your opinion at all?

santiagohermo commented 1 year ago

Thanks @SimonFreyaldenhoven! A few replies to https://github.com/JMSLab/eventstudyr/pull/18#issuecomment-1419802167

My understanding of the implementation of the smoothest line is that 1) it works correctly in basic examples, b) when it doesn't work, rather than produce a misleading path it generally fails to converge and spits out the corresponding message. Does that sound right to you?

EDIT: One other question @santiagohermo - did you look at the Stata implementation and got the sense that what you've done here is comparable?

I constructed the implementation based on the pdf note and xtevent and the implementation in STATA. I don't understand how the STATA solver works, but the objective function and the computation of the $(d_0,d_1,d_2)$ coefficients is very similar in both. The only thing that is different relative to STATA is the case where the number of normalized coefficients equals the dimension of the polynomial, which for some reason is handled different in xtevent.

Despite the similarities, the implementation couldn't solve the "jump" y outcome (which I imagine can be solved by STATA?). This makes me want to explore more.

If so, I'm happy to approve. Before we publish the package, perhaps we can just create a bunch of examples with different shapes and try to run them in R (and perhaps compare them to the Stata output) to get a better sense of the robustness of the implementation and whether we need to do more work.

So I would suggest to open a follow up issue once we're done here where we just make up a variety of event time paths + ses, and plotting if the results consistently make sense and look comparable in Stata and R.

Sounds great! It would be really useful to run exactly the same examples in both STATA and R to see how each package perform. I opened #19 for this.

If good for now, feel free to approve :)

santiagohermo commented 1 year ago
  • All tests were duplicated for OLS and FHS estimators. However, I thought this duplication was unnecessary because we were testing functionalities of EventStudyPlot, and so it was secondary how the coefficients and covariance matrix was generated. As a result, I decided to drop these duplicated tests. Maybe I missed something, so let me know if you disagree!

@santiagohermo our thinking with duplicating the tests was that in order to address the bullet point comments about still plotting the instrument in #11 (comment), we have to construct the Event Study Plot for FHS slightly differently than OLS. We wanted to make sure that those changes make it through to the Event Study Plot by having those tests for both the OLS and FHS cases.

Does that change your opinion at all?

Thanks @nateschor! I see your point. However, I think that plotting functionalities shouldn't depend on what estimator we used to generate the plotting data. I don't see how duplicating the test to, say, add the mean of the outcome to the labels or, for example, make sure that confidence intervals are present or absent depending on the function arguments, for FHS estimates would make a difference. Maybe what you want to test is that both OLS and FHS estimators produce all the elements necessary for EventStudyPlot to work properly?

This said, I would be happy to duplicate those tests were you think that passing FHS estimates to EventStudyPlot might make a difference.


Thanks for all your replies @nateschor! I think that I have finished a pass. Happy to keep discussing (and looking forward to closing this issue :) ).

nateschor commented 1 year ago

@santiagohermo I was experimenting with using smoothest path for the OLS and FHS examples in the documentation (code given below).

Given the inputs used in these examples, is this the expected behavior?

Thanks!

OLS

eventstudy_estimates_ols <- EventStudy(
  estimator = "OLS",
  data = df_sample_dynamic,
  outcomevar = "y_base",
  policyvar = "z",
  idvar = "id",
  timevar = "t",
  controls = "x_r",
  FE = TRUE,
  TFE = TRUE,
  post = 3,
  pre = 2,
  overidpre = 4,
  overidpost = 5,
  normalize = - 3,
  cluster = TRUE,
  anticipation_effects_normalization = TRUE
)

EventStudyPlot(
  estimates = eventstudy_estimates_ols,
  xtitle = "Event time",
  ytitle = "Coefficient",
  ybreaks = c(-1.5, -.5, 0, .5, 1.5),
  conf_level = .95,
  Supt = .95,
  num_sim = 1000,
  seed = 1234,
  Addmean = FALSE,
  Preeventcoeffs = TRUE,
  Posteventcoeffs = TRUE,
  Addzeroline = TRUE,
  Smpath = TRUE
)

image

FHS

eventstudy_estimates_fhs <- EventStudy(
  estimator = "FHS",
  data = df_sample_dynamic,
  outcomevar = "y_base",
  policyvar = "z",
  idvar = "id",
  timevar = "t",
  controls = "x_r",
  FE = TRUE,
  TFE = TRUE,
  post = 3,
  pre = 0,
  overidpre = 3,
  overidpost = 1,
  normalize = - 1,
  cluster = TRUE,
  proxy = "eta_m",
  anticipation_effects_normalization = TRUE
)

EventStudyPlot(
  estimates = eventstudy_estimates_fhs,
  xtitle = "Event time",
  ytitle = "Coefficient",
  ybreaks = seq(-5, 10, 5),
  conf_level = .95,
  Supt = .95,
  num_sim = 1000,
  seed = 1234,
  Addmean = FALSE,
  Preeventcoeffs = TRUE,
  Posteventcoeffs = TRUE,
  Addzeroline = TRUE,
  Smpath = TRUE
)

image

santiagohermo commented 1 year ago

Thanks for the useful testing @nateschor! Find some replies below, and please let me know if you have any other comments!

OLS

This is the intended behavior. Indeed, for the y_base outcome the smoothest path that explains the effect dynamics is the line $y = 0$. However, if you change the outcome to y_smooth you find this:

Code ```R library(devtools) load_all() eventstudy_estimates_ols <- EventStudy( estimator = "OLS", data = df_sample_dynamic, outcomevar = "y_smooth_m", policyvar = "z", idvar = "id", timevar = "t", controls = "x_r", post = 3, pre = 2, overidpre = 4, overidpost = 5, normalize = - 3 ) EventStudyPlot( estimates = eventstudy_estimates_ols, Smpath = TRUE ) ggsave("y_smooth.png", dpi = 250, width = 5, height = 3.5) ```

FHS

This is obviously not what should happen. Thanks for testing FHS, I hadn't done that so far.

There were two issues here:

Now the smoothest path seems to work fine for FHS:

smooth outcome variable jump outcome variable
smooth jump
Code ```R #library(devtools) #load_all() eventstudy_estimates_fhs <- EventStudy( estimator = "FHS", data = df_sample_dynamic, outcomevar = "y_smooth_m", policyvar = "z", idvar = "id", timevar = "t", controls = "x_r", post = 3, pre = 0, overidpre = 3, overidpost = 1, normalize = - 1, proxy = "eta_m", ) EventStudyPlot( estimates = eventstudy_estimates_fhs, Smpath = TRUE ) ggsave("y_smooth_fhs.png", dpi = 250, width = 5, height = 3.5) eventstudy_estimates_fhs <- EventStudy( estimator = "FHS", data = df_sample_dynamic, outcomevar = "y_jump_m", policyvar = "z", idvar = "id", timevar = "t", controls = "x_r", post = 3, pre = 0, overidpre = 3, overidpost = 1, normalize = - 1, proxy = "eta_m", ) EventStudyPlot( estimates = eventstudy_estimates_fhs, Smpath = TRUE ) ggsave("y_jump_fhs.png", dpi = 250, width = 5, height = 3.5) ```
santiagohermo commented 1 year ago

Open threads:

fyi @nateschor

nateschor commented 1 year ago

@santiagohermo can I make a suggestion regarding the two issues you caught in https://github.com/JMSLab/eventstudyr/pull/18#issuecomment-1423177535? Would you be able to add some tests in test-SmPathHelpers.R? We've been writing tests for the helper functions as well (all of the functions that start with "Prepare" as well as GetFirstDifferences are helper functions that we might not make user facing but we still tested).

I also like this comment about "bug-driven testing" from this R package workshop:

image

I don't quite follow the approach of writing a test before I fix it because it seems hard to me to write a test when you don't know what is wrong, but I do think it is a good idea that whenever we find a bug, turn it into a test so that we squash the bug permanently.

What do you think?

Thanks @santiagohermo for all your work on smoothest path and general package improvements!

santiagohermo commented 1 year ago

Thanks for the great comments @nateschor! I replied all of them, feel free to start replying to the replies :)

On the testing idea. I agree, it's hard to write a test to detect a bug you don't know exists! However, I think that running the smoothest path with OLS and FHS while testing would be a good idea. In principle you wouldn't know what the smoothest path is, so would you like to compare the produced path to anything? Not sure how this would work.

Btw, I added a test for the function to add zeros to the covariance matrix, which follows that logic.

nateschor commented 1 year ago
  • All tests were duplicated for OLS and FHS estimators. However, I thought this duplication was unnecessary because we were testing functionalities of EventStudyPlot, and so it was secondary how the coefficients and covariance matrix was generated. As a result, I decided to drop these duplicated tests. Maybe I missed something, so let me know if you disagree!

@santiagohermo our thinking with duplicating the tests was that in order to address the bullet point comments about still plotting the instrument in #11 (comment), we have to construct the Event Study Plot for FHS slightly differently than OLS. We wanted to make sure that those changes make it through to the Event Study Plot by having those tests for both the OLS and FHS cases. Does that change your opinion at all?

Thanks @nateschor! I see your point. However, I think that plotting functionalities shouldn't depend on what estimator we used to generate the plotting data. I don't see how duplicating the test to, say, add the mean of the outcome to the labels or, for example, make sure that confidence intervals are present or absent depending on the function arguments, for FHS estimates would make a difference. Maybe what you want to test is that both OLS and FHS estimators produce all the elements necessary for EventStudyPlot to work properly?

This said, I would be happy to duplicate those tests were you think that passing FHS estimates to EventStudyPlot might make a difference.

Thanks @santiagohermo! With your comment "I don't see how duplicating the test to, say, add the mean of the outcome to the labels or, for example, make sure that confidence intervals are present or absent depending on the function arguments, for FHS estimates would make a difference," I agree, and think I had in mind the third bullet point here:

"Avoid testing simple code that you’re confident will work. Instead focus your time on code that you’re not sure about, is fragile, or has complicated interdependencies. That said, I often find I make the most mistakes when I falsely assume that the problem is simple and doesn’t need any tests."

I'm happy to defer to you for the final call on whether or not we keep the duplicated test!

nateschor commented 1 year ago

On the testing idea. I agree, it's hard to write a test to detect a bug you don't know exists! However, I think that running the smoothest path with OLS and FHS while testing would be a good idea. In principle you wouldn't know what the smoothest path is, so would you like to compare the produced path to anything? Not sure how this would work.

Thanks @santiagohermo! Regarding not knowing in advance what the smoothest path is, maybe this makes more sense to do in #19, but one idea is to run the same Event Study in xtevent and eventstudyr and see if they agree? We compare the output of eventstudyr to base STATA here and wondering if it would be worthwhile to extract the values used for plotting the smoothest path in xtevent and check that they are the same for eventstudyr? We did some less formal testing for FHS here, and doing some formal testing for the smoothest path between the two packages might be a good check to make sure that xtevent and eventstudyr stay on the same path (pun not intended)

nateschor commented 1 year ago

@santiagohermo on the NAMESPACE duplication problem in https://github.com/JMSLab/eventstudyr/pull/18#discussion_r1097580307:

Currently, doing this leads to some test failing in test-EventStudyPlot(), but I think after we resolve https://github.com/JMSLab/eventstudyr/pull/18#discussion_r1102058858 the tests will pass and NAMESPACE will no longer have the duplicated import statement

nateschor commented 1 year ago

Per call: @santiagohermo will add a test for checking that automatically computing ybreaks for working and grab the smoothest path coefficients from xtevent in a .csv file that we can use to compare with eventstudyr

@nateschor will write unit tests to make sure that the dimensions for outputs for OLS and FHS are the same (up to differences in number of normalized coefficients) before getting fed into other functions

santiagohermo commented 1 year ago

A few updates @nateschor.

Per call: @santiagohermo will add a test for checking that automatically computing ybreaks for working and grab the smoothest path coefficients from xtevent in a .csv file that we can use to compare with eventstudyr

I think is better to prioritize merging this PR and starting to play with xtevent in #19. So, I decided to proceed as follows:

@nateschor will write unit tests to make sure that the dimensions for outputs for OLS and FHS are the same (up to differences in number of normalized coefficients) before getting fed into other functions

Since I was on it I added a test for this in https://github.com/JMSLab/eventstudyr/pull/18/commits/03ca1365cd0ccc031df010142b87eeaf011e4793. Is this what you more or less had in mind?


Thanks for the review @nateschor! Let me know if there is anything else we should do here. At least it looks like devtools is happy!

image

nateschor commented 1 year ago

@santiagohermo in https://github.com/JMSLab/eventstudyr/pull/18/commits/85f51f101b522ed988337614aab50cd240412b3f I removed the call to @examples in AddSmPath because it was giving the note below during devtools::check(). Were you thinking we should include an example for smoothest path?

Thanks!

image

nateschor commented 1 year ago

@santiagohermo https://github.com/JMSLab/eventstudyr/pull/18#issuecomment-1426836107 is exactly what I had in mind, thank you so much!

santiagohermo commented 1 year ago

Thanks @nateschor!

I think we are almost done! A few things open:

Let me know if I can help with anything else!

nateschor commented 1 year ago

@santiagohermo thanks!

Where should we make a note of https://github.com/JMSLab/eventstudyr/pull/18#discussion_r1101916720 to discuss in the next call?

I added for myself a calendar reminder right before our next call to bring it up during the meeting!

I really like your suggestions, and added it to the opening comment in https://github.com/JMSLab/eventstudyr/issues/17#issue-1553935737. It seems to me that we're doing more "aesthetic testing" for ybreaks since there aren't prespecified values that we need for y-axis breaks, so I think including it in #17 makes sense!

I approved the PR, @santiagohermo when you're ready and everyone else has approved, would you like to do the merging honors?

santiagohermo commented 1 year ago

Thanks @nateschor! Summary here.