adamSales / rebarLoop

tracking changes in the rebar-loop paper
3 stars 1 forks source link

feedback on loop package #3

Open adamSales opened 1 year ago

adamSales commented 1 year ago

This seems like as good a place as any to collect issues related to loop.estimator @johanngb

Really these should be separate issues, but then again really loop.estimator should be its own repo (I am tempted to make one, but really @johanngb should own it I think)

Here are two suggestions:

  1. Nicer output--right now, loop() just returns two unlabeled numbers with no formatting. We should write a print.loop function that gives output in the style of, say, estimatr, or even lm, with labeled estimate, standard error, hypothesis tests, confidence intervals
  2. When the outcome is binary randomforest throws a warning that you're using regression instead of classification. Can we turn this warning off? It will confuse people.
adamSales commented 1 year ago

Oh yeah, I also think there should be optional output like, say, R^2 from the random forests or whatever. we could go to town

adamSales commented 1 year ago

Formatting the "Pair" vector for p_loop:

Background: I am using p_loop() to estimate effects from a pair-matching design from an observational study, using the pairmatch() function from the optmatch package. That function returns a vector of pair IDs as a factor, where unmatched units have <NA> for their ID. e.g.:

> head(match)
   1    2    3    4    5    6 
<NA>  1.1 <NA> <NA> <NA> <NA> 

Plugging this into p_loop() led to two problems, one small and one serious. The small problem was that the P argument in p_loop() only takes numeric vectors. So:

> p_loop(Y=dat$y,Tr=dat$z,P=match,Z=cbind(prog))
Error in Z1 - Z2 : non-numeric argument to binary operator

I would think it would be fairly easy to allow for factors or character vectors, too. If not, the documentation should specify that P must be numeric.

More serious: The final output from p_loop() uses the full vectors of Y and Tr to estimate effects, including those cases with missing P, and throws an uninterpretable warning:

> p_loop(Y=dat$y,Tr=dat$z,P=as.numeric(match),Z=cbind(prog))
[1] 4.447821 2.663642
Warning message:
In Y[Tr == 1] - Y[Tr == 0] :
  longer object length is not a multiple of shorter object length

The problem comes in this line of p_loop:

tauhat = mean(Y[Tr == 1] - Y[Tr == 0]) - mean((2 * assigned$Tr - 1) * d)

If I exclude the unmatched cases at the outset I get the right answer (with no warning):

> p_loop(Y=dat$y[!is.na(match)],Tr=dat$z[!is.na(match)],P=as.numeric(match[!is.na(match)]),Z=cbind(prog[!is.na(match)]))
[1] 3.675760 2.717145

IMO the best solution would be to change the tauhat = line above to something like:

tauhat = with(assigned, mean((2*Tr - 1)*(Y1-Y2-d)))

since it uses assigned for the whole thing. An alternative would be to throw an error if there's an NA in P, or at least an informative warning.