Closed santiagohermo closed 1 year ago
@santiagohermo if you haven't already been doing this, would you be able to run devtools::test()
(Ctrl + Shift + T on Windows) before you push? The workflow here might be helpful. I just checked and am seeing a bunch of tests that are now failing, and I've at least found that it's easier to tackle them early as they come up rather than trying to address them all at once when they build up.
It's probably easier for me to address any of the tests that fail as you work on AddSmPath
and make general improvements, so just let me know when things fail and I'll take a look!
Thanks!
Thanks for flagging this @nateschor. I actually hadn't done any testing. I'll make sure to make sure they work in future commits. And happy to solve the ones that arose here as well as soon as I have some time!
Per call: @santiagohermo will finish getting smoothest path to work, and then we will go back and bring the unit tests up to date
Whenever one of us has bandwidth, we'll play around with/test the ybreaks
automatic tick generation to give more breaks (currently, for certain inputs it only gives 3 breaks and does not look great)
Here is a new update, very close to the finish line! I did the following:
A couple of comments/questions below.
On the implementation of the smoothest path @SimonFreyaldenhoven. First of all, thanks for your help! Second, here are some comments:
optim
package from base
R. Code is here"Nelder-Mead"
as method for the optimization. To prevent the discriminant from becoming negative I set the objective function to return Infinity in that case
y_jump_m
case. The optimizer returns convergence code 10
, which according to the documentation "indicates degeneracy of the Nelder--Mead simplex." I thought it was Ok to just throw an error in this case, but let me know if you think we should be able to handle this case and we can iterate searching for a better optimizer.xtevent
, I'm not doing anything special for the case when the number of normalized coefficients equals the order of the polynomial. Maybe @jorpppp has some insight on whether this is important.I think we reached a stable implementation (up to the pending review). If we want to improve following up on these comments, I'd suggest a new issue.
FYI @jmshapir
On the code @nateschor:
EventStudyPlot
, and so it was secondary how the coefficients and covariance matrix was generated. As a result, I decided to drop these duplicated tests. Maybe I missed something, so let me know if you disagree!I'm going to go over the code again, and I might leave some further comments/questions on the code. Thanks!
Thanks @santiagohermo!
My understanding of the implementation of the smoothest line is that 1) it works correctly in basic examples, b) when it doesn't work, rather than produce a misleading path it generally fails to converge and spits out the corresponding message. Does that sound right to you?
If so, I'm happy to approve. Before we publish the package, perhaps we can just create a bunch of examples with different shapes and try to run them in R (and perhaps compare them to the Stata output) to get a better sense of the robustness of the implementation and whether we need to do more work.
So I would suggest to open a follow up issue once we're done here where we just make up a variety of event time paths + ses, and plotting if the results consistently make sense and look comparable in Stata and R.
EDIT: One other question @santiagohermo - did you look at the Stata implementation and got the sense that what you've done here is comparable?
- All tests were duplicated for OLS and FHS estimators. However, I thought this duplication was unnecessary because we were testing functionalities of
EventStudyPlot
, and so it was secondary how the coefficients and covariance matrix was generated. As a result, I decided to drop these duplicated tests. Maybe I missed something, so let me know if you disagree!
@santiagohermo our thinking with duplicating the tests was that in order to address the bullet point comments about still plotting the instrument in https://github.com/JMSLab/eventstudyr/issues/11#issuecomment-1240070359, we have to construct the Event Study Plot for FHS slightly differently than OLS. We wanted to make sure that those changes make it through to the Event Study Plot by having those tests for both the OLS and FHS cases.
Does that change your opinion at all?
Thanks @SimonFreyaldenhoven! A few replies to https://github.com/JMSLab/eventstudyr/pull/18#issuecomment-1419802167
My understanding of the implementation of the smoothest line is that 1) it works correctly in basic examples, b) when it doesn't work, rather than produce a misleading path it generally fails to converge and spits out the corresponding message. Does that sound right to you?
EDIT: One other question @santiagohermo - did you look at the Stata implementation and got the sense that what you've done here is comparable?
I constructed the implementation based on the pdf note and xtevent
and the implementation in STATA. I don't understand how the STATA solver works, but the objective function and the computation of the $(d_0,d_1,d_2)$ coefficients is very similar in both. The only thing that is different relative to STATA is the case where the number of normalized coefficients equals the dimension of the polynomial, which for some reason is handled different in xtevent
.
Despite the similarities, the implementation couldn't solve the "jump" y outcome (which I imagine can be solved by STATA?). This makes me want to explore more.
If so, I'm happy to approve. Before we publish the package, perhaps we can just create a bunch of examples with different shapes and try to run them in R (and perhaps compare them to the Stata output) to get a better sense of the robustness of the implementation and whether we need to do more work.
So I would suggest to open a follow up issue once we're done here where we just make up a variety of event time paths + ses, and plotting if the results consistently make sense and look comparable in Stata and R.
Sounds great! It would be really useful to run exactly the same examples in both STATA and R to see how each package perform. I opened #19 for this.
If good for now, feel free to approve :)
- All tests were duplicated for OLS and FHS estimators. However, I thought this duplication was unnecessary because we were testing functionalities of
EventStudyPlot
, and so it was secondary how the coefficients and covariance matrix was generated. As a result, I decided to drop these duplicated tests. Maybe I missed something, so let me know if you disagree!@santiagohermo our thinking with duplicating the tests was that in order to address the bullet point comments about still plotting the instrument in #11 (comment), we have to construct the Event Study Plot for FHS slightly differently than OLS. We wanted to make sure that those changes make it through to the Event Study Plot by having those tests for both the OLS and FHS cases.
Does that change your opinion at all?
Thanks @nateschor! I see your point. However, I think that plotting functionalities shouldn't depend on what estimator we used to generate the plotting data. I don't see how duplicating the test to, say, add the mean of the outcome to the labels or, for example, make sure that confidence intervals are present or absent depending on the function arguments, for FHS estimates would make a difference. Maybe what you want to test is that both OLS
and FHS
estimators produce all the elements necessary for EventStudyPlot
to work properly?
This said, I would be happy to duplicate those tests were you think that passing FHS estimates to EventStudyPlot
might make a difference.
Thanks for all your replies @nateschor! I think that I have finished a pass. Happy to keep discussing (and looking forward to closing this issue :) ).
@santiagohermo I was experimenting with using smoothest path for the OLS and FHS examples in the documentation (code given below).
y = 0
for all time horizonsGiven the inputs used in these examples, is this the expected behavior?
Thanks!
eventstudy_estimates_ols <- EventStudy(
estimator = "OLS",
data = df_sample_dynamic,
outcomevar = "y_base",
policyvar = "z",
idvar = "id",
timevar = "t",
controls = "x_r",
FE = TRUE,
TFE = TRUE,
post = 3,
pre = 2,
overidpre = 4,
overidpost = 5,
normalize = - 3,
cluster = TRUE,
anticipation_effects_normalization = TRUE
)
EventStudyPlot(
estimates = eventstudy_estimates_ols,
xtitle = "Event time",
ytitle = "Coefficient",
ybreaks = c(-1.5, -.5, 0, .5, 1.5),
conf_level = .95,
Supt = .95,
num_sim = 1000,
seed = 1234,
Addmean = FALSE,
Preeventcoeffs = TRUE,
Posteventcoeffs = TRUE,
Addzeroline = TRUE,
Smpath = TRUE
)
eventstudy_estimates_fhs <- EventStudy(
estimator = "FHS",
data = df_sample_dynamic,
outcomevar = "y_base",
policyvar = "z",
idvar = "id",
timevar = "t",
controls = "x_r",
FE = TRUE,
TFE = TRUE,
post = 3,
pre = 0,
overidpre = 3,
overidpost = 1,
normalize = - 1,
cluster = TRUE,
proxy = "eta_m",
anticipation_effects_normalization = TRUE
)
EventStudyPlot(
estimates = eventstudy_estimates_fhs,
xtitle = "Event time",
ytitle = "Coefficient",
ybreaks = seq(-5, 10, 5),
conf_level = .95,
Supt = .95,
num_sim = 1000,
seed = 1234,
Addmean = FALSE,
Preeventcoeffs = TRUE,
Posteventcoeffs = TRUE,
Addzeroline = TRUE,
Smpath = TRUE
)
Thanks for the useful testing @nateschor! Find some replies below, and please let me know if you have any other comments!
This is the intended behavior. Indeed, for the y_base
outcome the smoothest path that explains the effect dynamics is the line $y = 0$. However, if you change the outcome to y_smooth
you find this:
This is obviously not what should happen. Thanks for testing FHS, I hadn't done that so far.
There were two issues here:
AddZerosCovar
is capable of handling the multiple-normalization case. Before committing, I ran devtools:test()
and everything passed.Anorm
argument was not conformable in a matrix multiplication when searching for the minimum order.
Anorm
is always a matrix object (with drop=FALSE
) and we perform the correct multiplication so that dimensions work.Now the smoothest path seems to work fine for FHS
:
smooth outcome variable | jump outcome variable |
---|---|
@santiagohermo can I make a suggestion regarding the two issues you caught in https://github.com/JMSLab/eventstudyr/pull/18#issuecomment-1423177535? Would you be able to add some tests in test-SmPathHelpers.R
? We've been writing tests for the helper functions as well (all of the functions that start with "Prepare" as well as GetFirstDifferences are helper functions that we might not make user facing but we still tested).
I also like this comment about "bug-driven testing" from this R package workshop:
I don't quite follow the approach of writing a test before I fix it because it seems hard to me to write a test when you don't know what is wrong, but I do think it is a good idea that whenever we find a bug, turn it into a test so that we squash the bug permanently.
What do you think?
Thanks @santiagohermo for all your work on smoothest path and general package improvements!
Thanks for the great comments @nateschor! I replied all of them, feel free to start replying to the replies :)
On the testing idea. I agree, it's hard to write a test to detect a bug you don't know exists! However, I think that running the smoothest path with OLS and FHS while testing would be a good idea. In principle you wouldn't know what the smoothest path is, so would you like to compare the produced path to anything? Not sure how this would work.
Btw, I added a test for the function to add zeros to the covariance matrix, which follows that logic.
- All tests were duplicated for OLS and FHS estimators. However, I thought this duplication was unnecessary because we were testing functionalities of
EventStudyPlot
, and so it was secondary how the coefficients and covariance matrix was generated. As a result, I decided to drop these duplicated tests. Maybe I missed something, so let me know if you disagree!@santiagohermo our thinking with duplicating the tests was that in order to address the bullet point comments about still plotting the instrument in #11 (comment), we have to construct the Event Study Plot for FHS slightly differently than OLS. We wanted to make sure that those changes make it through to the Event Study Plot by having those tests for both the OLS and FHS cases. Does that change your opinion at all?
Thanks @nateschor! I see your point. However, I think that plotting functionalities shouldn't depend on what estimator we used to generate the plotting data. I don't see how duplicating the test to, say, add the mean of the outcome to the labels or, for example, make sure that confidence intervals are present or absent depending on the function arguments, for FHS estimates would make a difference. Maybe what you want to test is that both
OLS
andFHS
estimators produce all the elements necessary forEventStudyPlot
to work properly?This said, I would be happy to duplicate those tests were you think that passing FHS estimates to
EventStudyPlot
might make a difference.
Thanks @santiagohermo! With your comment "I don't see how duplicating the test to, say, add the mean of the outcome to the labels or, for example, make sure that confidence intervals are present or absent depending on the function arguments, for FHS estimates would make a difference," I agree, and think I had in mind the third bullet point here:
"Avoid testing simple code that you’re confident will work. Instead focus your time on code that you’re not sure about, is fragile, or has complicated interdependencies. That said, I often find I make the most mistakes when I falsely assume that the problem is simple and doesn’t need any tests."
I'm happy to defer to you for the final call on whether or not we keep the duplicated test!
On the testing idea. I agree, it's hard to write a test to detect a bug you don't know exists! However, I think that running the smoothest path with OLS and FHS while testing would be a good idea. In principle you wouldn't know what the smoothest path is, so would you like to compare the produced path to anything? Not sure how this would work.
Thanks @santiagohermo! Regarding not knowing in advance what the smoothest path is, maybe this makes more sense to do in #19, but one idea is to run the same Event Study in xtevent
and eventstudyr
and see if they agree? We compare the output of eventstudyr
to base STATA here and wondering if it would be worthwhile to extract the values used for plotting the smoothest path in xtevent
and check that they are the same for eventstudyr
? We did some less formal testing for FHS here, and doing some formal testing for the smoothest path between the two packages might be a good check to make sure that xtevent
and eventstudyr
stay on the same path (pun not intended)
@santiagohermo on the NAMESPACE duplication problem in https://github.com/JMSLab/eventstudyr/pull/18#discussion_r1097580307:
NAMESPACE
file gets duplicated in https://github.com/JMSLab/eventstudyr/commit/016a958187ad4f3ad8999329bd5bc810b1952105@rawNamespace
inserts code literally into the NAMESPACE", which I don't necessarily think we wantsetDT
, setorderv
, shift
, and setorder
from data.table
, I propose we replace the @rawNamespace
calls with importFrom data.table setDT setorderv shift setorder
based on hereCurrently, doing this leads to some test failing in test-EventStudyPlot()
, but I think after we resolve https://github.com/JMSLab/eventstudyr/pull/18#discussion_r1102058858 the tests will pass and NAMESPACE
will no longer have the duplicated import statement
Per call:
@santiagohermo will add a test for checking that automatically computing ybreaks for working and grab the smoothest path coefficients from xtevent
in a .csv
file that we can use to compare with eventstudyr
@nateschor will write unit tests to make sure that the dimensions for outputs for OLS and FHS are the same (up to differences in number of normalized coefficients) before getting fed into other functions
A few updates @nateschor.
Per call: @santiagohermo will add a test for checking that automatically computing ybreaks for working and grab the smoothest path coefficients from
xtevent
in a.csv
file that we can use to compare witheventstudyr
I think is better to prioritize merging this PR and starting to play with xtevent
in #19. So, I decided to proceed as follows:
eventstudyr
's smoothest path to xtevent
's path.@nateschor will write unit tests to make sure that the dimensions for outputs for OLS and FHS are the same (up to differences in number of normalized coefficients) before getting fed into other functions
Since I was on it I added a test for this in https://github.com/JMSLab/eventstudyr/pull/18/commits/03ca1365cd0ccc031df010142b87eeaf011e4793. Is this what you more or less had in mind?
Thanks for the review @nateschor! Let me know if there is anything else we should do here. At least it looks like devtools
is happy!
@santiagohermo in https://github.com/JMSLab/eventstudyr/pull/18/commits/85f51f101b522ed988337614aab50cd240412b3f I removed the call to @examples
in AddSmPath
because it was giving the note below during devtools::check()
. Were you thinking we should include an example for smoothest path?
Thanks!
@santiagohermo https://github.com/JMSLab/eventstudyr/pull/18#issuecomment-1426836107 is exactly what I had in mind, thank you so much!
Thanks @nateschor!
@examples
. I think it's a good idea to drop the examples call. In here we decided not to include an example.I think we are almost done! A few things open:
Let me know if I can help with anything else!
@santiagohermo thanks!
Where should we make a note of https://github.com/JMSLab/eventstudyr/pull/18#discussion_r1101916720 to discuss in the next call?
I added for myself a calendar reminder right before our next call to bring it up during the meeting!
- Should we resolve the y-breaks thread? Maybe we can add a note to test it more deeply in Test package and review its documentation #17?
I really like your suggestions, and added it to the opening comment in https://github.com/JMSLab/eventstudyr/issues/17#issue-1553935737. It seems to me that we're doing more "aesthetic testing" for ybreaks
since there aren't prespecified values that we need for y-axis breaks, so I think including it in #17 makes sense!
I approved the PR, @santiagohermo when you're ready and everyone else has approved, would you like to do the merging honors?
Thanks @nateschor! Summary here.
This is a PR for #9, where we added code to implement the smoothest path. For now the solver doesn't work, so a placeholder path is returned instead (the one obtained when finding the minimum order).
Some things to do here: