Pull Request for #9: Implement smoothest path

santiagohermo commented 1 year ago

This is a PR for #9, where we added code to implement the smoothest path. For now the solver doesn't work, so a placeholder path is returned instead (the one obtained when finding the minimum order).

Some things to do here:

Can you review the code @nateschor? Code (with placeholders) should return a valid smooth path, and changes//additions to code should align with the general style of the package.
- I will also add some questions for you in a review @nateschor, which is why I assigned you
Can you check the issues with the solver @SimonFreyaldenhoven @jmshapir? I prepared this markdown that shows two solvers and the errors they return. Happy to chat about this anytime!

nateschor commented 1 year ago

@santiagohermo if you haven't already been doing this, would you be able to run devtools::test() (Ctrl + Shift + T on Windows) before you push? The workflow here might be helpful. I just checked and am seeing a bunch of tests that are now failing, and I've at least found that it's easier to tackle them early as they come up rather than trying to address them all at once when they build up.

It's probably easier for me to address any of the tests that fail as you work on AddSmPath and make general improvements, so just let me know when things fail and I'll take a look!

Thanks!

santiagohermo commented 1 year ago

Thanks for flagging this @nateschor. I actually hadn't done any testing. I'll make sure to make sure they work in future commits. And happy to solve the ones that arose here as well as soon as I have some time!

nateschor commented 1 year ago

Per call: @santiagohermo will finish getting smoothest path to work, and then we will go back and bring the unit tests up to date

Whenever one of us has bandwidth, we'll play around with/test the ybreaks automatic tick generation to give more breaks (currently, for certain inputs it only gives 3 breaks and does not look great)

santiagohermo commented 1 year ago

Here is a new update, very close to the finish line! I did the following:

Finished implementation of smoothest path and added error handling
Made sure tests work
Updated documentation
Deleted issue subfolder

A couple of comments/questions below.

On the implementation of the smoothest path @SimonFreyaldenhoven. First of all, thanks for your help! Second, here are some comments:

We now use the optim package from base R. Code is here
We use "Nelder-Mead" as method for the optimization. To prevent the discriminant from becoming negative I set the objective function to return Infinity in that case
- If the optimizer doesn't converge successfully we stop with an error and ask the user to unselect the smoothest path option.
Important:
- The optimizer cannot solve the y_jump_m case. The optimizer returns convergence code 10, which according to the documentation "indicates degeneracy of the Nelder--Mead simplex." I thought it was Ok to just throw an error in this case, but let me know if you think we should be able to handle this case and we can iterate searching for a better optimizer.
- Unlike xtevent, I'm not doing anything special for the case when the number of normalized coefficients equals the order of the polynomial. Maybe @jorpppp has some insight on whether this is important.

I think we reached a stable implementation (up to the pending review). If we want to improve following up on these comments, I'd suggest a new issue.

FYI @jmshapir

On the code @nateschor:

I made sure the tests work (I also took advantage of the opportunity to make them more readable).
- All tests were duplicated for OLS and FHS estimators. However, I thought this duplication was unnecessary because we were testing functionalities of EventStudyPlot, and so it was secondary how the coefficients and covariance matrix was generated. As a result, I decided to drop these duplicated tests. Maybe I missed something, so let me know if you disagree!
Per call, it'd be useful to double-check the construction of the y-breaks. FYI @rcalvo12
Can you quickly review the updates, and make sure I'm not missing any step to merge these changes? (I tried to follow the example, but since this is my first time I would like to make sure I'm not making any errors.)

I'm going to go over the code again, and I might leave some further comments/questions on the code. Thanks!

SimonFreyaldenhoven commented 1 year ago

Thanks @santiagohermo!

My understanding of the implementation of the smoothest line is that 1) it works correctly in basic examples, b) when it doesn't work, rather than produce a misleading path it generally fails to converge and spits out the corresponding message. Does that sound right to you?

If so, I'm happy to approve. Before we publish the package, perhaps we can just create a bunch of examples with different shapes and try to run them in R (and perhaps compare them to the Stata output) to get a better sense of the robustness of the implementation and whether we need to do more work.

So I would suggest to open a follow up issue once we're done here where we just make up a variety of event time paths + ses, and plotting if the results consistently make sense and look comparable in Stata and R.

EDIT: One other question @santiagohermo - did you look at the Stata implementation and got the sense that what you've done here is comparable?

nateschor commented 1 year ago

All tests were duplicated for OLS and FHS estimators. However, I thought this duplication was unnecessary because we were testing functionalities of EventStudyPlot, and so it was secondary how the coefficients and covariance matrix was generated. As a result, I decided to drop these duplicated tests. Maybe I missed something, so let me know if you disagree!

@santiagohermo our thinking with duplicating the tests was that in order to address the bullet point comments about still plotting the instrument in https://github.com/JMSLab/eventstudyr/issues/11#issuecomment-1240070359, we have to construct the Event Study Plot for FHS slightly differently than OLS. We wanted to make sure that those changes make it through to the Event Study Plot by having those tests for both the OLS and FHS cases.

Does that change your opinion at all?

santiagohermo commented 1 year ago

Thanks @SimonFreyaldenhoven! A few replies to https://github.com/JMSLab/eventstudyr/pull/18#issuecomment-1419802167

My understanding of the implementation of the smoothest line is that 1) it works correctly in basic examples, b) when it doesn't work, rather than produce a misleading path it generally fails to converge and spits out the corresponding message. Does that sound right to you?

EDIT: One other question @santiagohermo - did you look at the Stata implementation and got the sense that what you've done here is comparable?

I constructed the implementation based on the pdf note and xtevent and the implementation in STATA. I don't understand how the STATA solver works, but the objective function and the computation of the $(d_0,d_1,d_2)$ coefficients is very similar in both. The only thing that is different relative to STATA is the case where the number of normalized coefficients equals the dimension of the polynomial, which for some reason is handled different in xtevent.

Despite the similarities, the implementation couldn't solve the "jump" y outcome (which I imagine can be solved by STATA?). This makes me want to explore more.

If so, I'm happy to approve. Before we publish the package, perhaps we can just create a bunch of examples with different shapes and try to run them in R (and perhaps compare them to the Stata output) to get a better sense of the robustness of the implementation and whether we need to do more work.

So I would suggest to open a follow up issue once we're done here where we just make up a variety of event time paths + ses, and plotting if the results consistently make sense and look comparable in Stata and R.

Sounds great! It would be really useful to run exactly the same examples in both STATA and R to see how each package perform. I opened #19 for this.

If good for now, feel free to approve :)

santiagohermo commented 1 year ago

All tests were duplicated for OLS and FHS estimators. However, I thought this duplication was unnecessary because we were testing functionalities of EventStudyPlot, and so it was secondary how the coefficients and covariance matrix was generated. As a result, I decided to drop these duplicated tests. Maybe I missed something, so let me know if you disagree!

@santiagohermo our thinking with duplicating the tests was that in order to address the bullet point comments about still plotting the instrument in #11 (comment), we have to construct the Event Study Plot for FHS slightly differently than OLS. We wanted to make sure that those changes make it through to the Event Study Plot by having those tests for both the OLS and FHS cases.

Does that change your opinion at all?

Thanks @nateschor! I see your point. However, I think that plotting functionalities shouldn't depend on what estimator we used to generate the plotting data. I don't see how duplicating the test to, say, add the mean of the outcome to the labels or, for example, make sure that confidence intervals are present or absent depending on the function arguments, for FHS estimates would make a difference. Maybe what you want to test is that both OLS and FHS estimators produce all the elements necessary for EventStudyPlot to work properly?

This said, I would be happy to duplicate those tests were you think that passing FHS estimates to EventStudyPlot might make a difference.

Thanks for all your replies @nateschor! I think that I have finished a pass. Happy to keep discussing (and looking forward to closing this issue :) ).

nateschor commented 1 year ago

@santiagohermo I was experimenting with using smoothest path for the OLS and FHS examples in the documentation (code given below).

For OLS, it looks like the smoothest path line is y = 0 for all time horizons
For FHS, no plot is generated because of "Error in covar[coeffs_order, coeffs_order] : subscript out of bounds"

Given the inputs used in these examples, is this the expected behavior?

Thanks!

OLS

eventstudy_estimates_ols <- EventStudy(
  estimator = "OLS",
  data = df_sample_dynamic,
  outcomevar = "y_base",
  policyvar = "z",
  idvar = "id",
  timevar = "t",
  controls = "x_r",
  FE = TRUE,
  TFE = TRUE,
  post = 3,
  pre = 2,
  overidpre = 4,
  overidpost = 5,
  normalize = - 3,
  cluster = TRUE,
  anticipation_effects_normalization = TRUE
)

EventStudyPlot(
  estimates = eventstudy_estimates_ols,
  xtitle = "Event time",
  ytitle = "Coefficient",
  ybreaks = c(-1.5, -.5, 0, .5, 1.5),
  conf_level = .95,
  Supt = .95,
  num_sim = 1000,
  seed = 1234,
  Addmean = FALSE,
  Preeventcoeffs = TRUE,
  Posteventcoeffs = TRUE,
  Addzeroline = TRUE,
  Smpath = TRUE
)

FHS

eventstudy_estimates_fhs <- EventStudy(
  estimator = "FHS",
  data = df_sample_dynamic,
  outcomevar = "y_base",
  policyvar = "z",
  idvar = "id",
  timevar = "t",
  controls = "x_r",
  FE = TRUE,
  TFE = TRUE,
  post = 3,
  pre = 0,
  overidpre = 3,
  overidpost = 1,
  normalize = - 1,
  cluster = TRUE,
  proxy = "eta_m",
  anticipation_effects_normalization = TRUE
)

EventStudyPlot(
  estimates = eventstudy_estimates_fhs,
  xtitle = "Event time",
  ytitle = "Coefficient",
  ybreaks = seq(-5, 10, 5),
  conf_level = .95,
  Supt = .95,
  num_sim = 1000,
  seed = 1234,
  Addmean = FALSE,
  Preeventcoeffs = TRUE,
  Posteventcoeffs = TRUE,
  Addzeroline = TRUE,
  Smpath = TRUE
)

santiagohermo commented 1 year ago

Thanks for the useful testing @nateschor! Find some replies below, and please let me know if you have any other comments!

OLS

This is the intended behavior. Indeed, for the y_base outcome the smoothest path that explains the effect dynamics is the line $y = 0$. However, if you change the outcome to y_smooth you find this:

Code

```R library(devtools) load_all() eventstudy_estimates_ols <- EventStudy( estimator = "OLS", data = df_sample_dynamic, outcomevar = "y_smooth_m", policyvar = "z", idvar = "id", timevar = "t", controls = "x_r", post = 3, pre = 2, overidpre = 4, overidpost = 5, normalize = - 3 ) EventStudyPlot( estimates = eventstudy_estimates_ols, Smpath = TRUE ) ggsave("y_smooth.png", dpi = 250, width = 5, height = 3.5) ```

FHS

This is obviously not what should happen. Thanks for testing FHS, I hadn't done that so far.

There were two issues here:

The error you found happened when we the function was trying to add zeros to the covariance matrix. FHS has two normalizations, and apparently the function was not robust to this.
- I fixed this in https://github.com/JMSLab/eventstudyr/pull/18/commits/e1821bf9cb9d476cff1e29b32dbeb02a866d3877. Specifically, the changes here.
- In https://github.com/JMSLab/eventstudyr/pull/18/commits/b17155caa5df791e2ff240b215f08118ddb6709e I added a test to make sure that the function AddZerosCovar is capable of handling the multiple-normalization case. Before committing, I ran devtools:test() and everything passed.
After I fixed the problem above I noticed an error in a multiplication: the Anorm argument was not conformable in a matrix multiplication when searching for the minimum order.
- This function was working anyways because, with 1 normalized coefficient, the vector was automatically transposed to make the multiplication work.
- I fixed this in https://github.com/JMSLab/eventstudyr/pull/18/commits/e1821bf9cb9d476cff1e29b32dbeb02a866d3877. Specifically, here. Now we make sure that Anorm is always a matrix object (with drop=FALSE) and we perform the correct multiplication so that dimensions work.

Now the smoothest path seems to work fine for FHS:

smooth outcome variable	jump outcome variable

Code

```R #library(devtools) #load_all() eventstudy_estimates_fhs <- EventStudy( estimator = "FHS", data = df_sample_dynamic, outcomevar = "y_smooth_m", policyvar = "z", idvar = "id", timevar = "t", controls = "x_r", post = 3, pre = 0, overidpre = 3, overidpost = 1, normalize = - 1, proxy = "eta_m", ) EventStudyPlot( estimates = eventstudy_estimates_fhs, Smpath = TRUE ) ggsave("y_smooth_fhs.png", dpi = 250, width = 5, height = 3.5) eventstudy_estimates_fhs <- EventStudy( estimator = "FHS", data = df_sample_dynamic, outcomevar = "y_jump_m", policyvar = "z", idvar = "id", timevar = "t", controls = "x_r", post = 3, pre = 0, overidpre = 3, overidpost = 1, normalize = - 1, proxy = "eta_m", ) EventStudyPlot( estimates = eventstudy_estimates_fhs, Smpath = TRUE ) ggsave("y_jump_fhs.png", dpi = 250, width = 5, height = 3.5) ```

santiagohermo commented 1 year ago

Open threads:

On y-breaks, see here
On NAMESPACE, see here

fyi @nateschor

nateschor commented 1 year ago

@santiagohermo can I make a suggestion regarding the two issues you caught in https://github.com/JMSLab/eventstudyr/pull/18#issuecomment-1423177535? Would you be able to add some tests in test-SmPathHelpers.R? We've been writing tests for the helper functions as well (all of the functions that start with "Prepare" as well as GetFirstDifferences are helper functions that we might not make user facing but we still tested).

I also like this comment about "bug-driven testing" from this R package workshop:

I don't quite follow the approach of writing a test before I fix it because it seems hard to me to write a test when you don't know what is wrong, but I do think it is a good idea that whenever we find a bug, turn it into a test so that we squash the bug permanently.

What do you think?

Thanks @santiagohermo for all your work on smoothest path and general package improvements!

santiagohermo commented 1 year ago

Thanks for the great comments @nateschor! I replied all of them, feel free to start replying to the replies :)

On the testing idea. I agree, it's hard to write a test to detect a bug you don't know exists! However, I think that running the smoothest path with OLS and FHS while testing would be a good idea. In principle you wouldn't know what the smoothest path is, so would you like to compare the produced path to anything? Not sure how this would work.

Btw, I added a test for the function to add zeros to the covariance matrix, which follows that logic.

nateschor commented 1 year ago

All tests were duplicated for OLS and FHS estimators. However, I thought this duplication was unnecessary because we were testing functionalities of EventStudyPlot, and so it was secondary how the coefficients and covariance matrix was generated. As a result, I decided to drop these duplicated tests. Maybe I missed something, so let me know if you disagree!

@santiagohermo our thinking with duplicating the tests was that in order to address the bullet point comments about still plotting the instrument in #11 (comment), we have to construct the Event Study Plot for FHS slightly differently than OLS. We wanted to make sure that those changes make it through to the Event Study Plot by having those tests for both the OLS and FHS cases. Does that change your opinion at all?

Thanks @nateschor! I see your point. However, I think that plotting functionalities shouldn't depend on what estimator we used to generate the plotting data. I don't see how duplicating the test to, say, add the mean of the outcome to the labels or, for example, make sure that confidence intervals are present or absent depending on the function arguments, for FHS estimates would make a difference. Maybe what you want to test is that both OLS and FHS estimators produce all the elements necessary for EventStudyPlot to work properly?

This said, I would be happy to duplicate those tests were you think that passing FHS estimates to EventStudyPlot might make a difference.

Thanks @santiagohermo! With your comment "I don't see how duplicating the test to, say, add the mean of the outcome to the labels or, for example, make sure that confidence intervals are present or absent depending on the function arguments, for FHS estimates would make a difference," I agree, and think I had in mind the third bullet point here:

"Avoid testing simple code that you’re confident will work. Instead focus your time on code that you’re not sure about, is fragile, or has complicated interdependencies. That said, I often find I make the most mistakes when I falsely assume that the problem is simple and doesn’t need any tests."

I'm happy to defer to you for the final call on whether or not we keep the duplicated test!

nateschor commented 1 year ago

On the testing idea. I agree, it's hard to write a test to detect a bug you don't know exists! However, I think that running the smoothest path with OLS and FHS while testing would be a good idea. In principle you wouldn't know what the smoothest path is, so would you like to compare the produced path to anything? Not sure how this would work.

Thanks @santiagohermo! Regarding not knowing in advance what the smoothest path is, maybe this makes more sense to do in #19, but one idea is to run the same Event Study in xtevent and eventstudyr and see if they agree? We compare the output of eventstudyr to base STATA here and wondering if it would be worthwhile to extract the values used for plotting the smoothest path in xtevent and check that they are the same for eventstudyr? We did some less formal testing for FHS here, and doing some formal testing for the smoothest path between the two packages might be a good check to make sure that xtevent and eventstudyr stay on the same path (pun not intended)

nateschor commented 1 year ago

@santiagohermo on the NAMESPACE duplication problem in https://github.com/JMSLab/eventstudyr/pull/18#discussion_r1097580307:

The NAMESPACE file gets duplicated in https://github.com/JMSLab/eventstudyr/commit/016a958187ad4f3ad8999329bd5bc810b1952105
from here I saw that "@rawNamespace inserts code literally into the NAMESPACE", which I don't necessarily think we want
since we only used setDT, setorderv, shift, and setorder from data.table, I propose we replace the @rawNamespace calls with importFrom data.table setDT setorderv shift setorder based on here

Currently, doing this leads to some test failing in test-EventStudyPlot(), but I think after we resolve https://github.com/JMSLab/eventstudyr/pull/18#discussion_r1102058858 the tests will pass and NAMESPACE will no longer have the duplicated import statement

nateschor commented 1 year ago

Per call: @santiagohermo will add a test for checking that automatically computing ybreaks for working and grab the smoothest path coefficients from xtevent in a .csv file that we can use to compare with eventstudyr

@nateschor will write unit tests to make sure that the dimensions for outputs for OLS and FHS are the same (up to differences in number of normalized coefficients) before getting fed into other functions

santiagohermo commented 1 year ago

A few updates @nateschor.

Per call: @santiagohermo will add a test for checking that automatically computing ybreaks for working and grab the smoothest path coefficients from xtevent in a .csv file that we can use to compare with eventstudyr

I think is better to prioritize merging this PR and starting to play with xtevent in #19. So, I decided to proceed as follows:

Added a test to make sure the smoothest path is within expectations for the example data
Added a note in 19's opening comment to consider the extended test that compares the outcome of eventstudyr's smoothest path to xtevent's path.
In https://github.com/JMSLab/eventstudyr/pull/18/commits/a22b5a288d72cc0315a669ff116ccc3d2f51c02c I added a test specific for FHS

@nateschor will write unit tests to make sure that the dimensions for outputs for OLS and FHS are the same (up to differences in number of normalized coefficients) before getting fed into other functions

Since I was on it I added a test for this in https://github.com/JMSLab/eventstudyr/pull/18/commits/03ca1365cd0ccc031df010142b87eeaf011e4793. Is this what you more or less had in mind?

Thanks for the review @nateschor! Let me know if there is anything else we should do here. At least it looks like devtools is happy!

nateschor commented 1 year ago

@santiagohermo in https://github.com/JMSLab/eventstudyr/pull/18/commits/85f51f101b522ed988337614aab50cd240412b3f I removed the call to @examples in AddSmPath because it was giving the note below during devtools::check(). Were you thinking we should include an example for smoothest path?

Thanks!

nateschor commented 1 year ago

@santiagohermo https://github.com/JMSLab/eventstudyr/pull/18#issuecomment-1426836107 is exactly what I had in mind, thank you so much!

santiagohermo commented 1 year ago

Thanks @nateschor!

On the smoothest path @examples. I think it's a good idea to drop the examples call. In here we decided not to include an example.
On tests. No problem!

I think we are almost done! A few things open:

Where should we make a note of this to discuss in the next call? FYI @SimonFreyaldenhoven
Should we resolve the y-breaks thread? Maybe we can add a note to test it more deeply in #17?

Let me know if I can help with anything else!

nateschor commented 1 year ago

@santiagohermo thanks!

Where should we make a note of https://github.com/JMSLab/eventstudyr/pull/18#discussion_r1101916720 to discuss in the next call?

I added for myself a calendar reminder right before our next call to bring it up during the meeting!

Should we resolve the y-breaks thread? Maybe we can add a note to test it more deeply in Test package and review its documentation #17?

I really like your suggestions, and added it to the opening comment in https://github.com/JMSLab/eventstudyr/issues/17#issue-1553935737. It seems to me that we're doing more "aesthetic testing" for ybreaks since there aren't prespecified values that we need for y-axis breaks, so I think including it in #17 makes sense!

I approved the PR, @santiagohermo when you're ready and everyone else has approved, would you like to do the merging honors?

santiagohermo commented 1 year ago

Thanks @nateschor! Summary here.

JMSLab / eventstudyr

Pull Request for #9: Implement smoothest path #18

OLS

FHS

OLS

FHS