feedback on loop package

adamSales commented 1 year ago

This seems like as good a place as any to collect issues related to loop.estimator @johanngb

Really these should be separate issues, but then again really loop.estimator should be its own repo (I am tempted to make one, but really @johanngb should own it I think)

Here are two suggestions:

Nicer output--right now, loop() just returns two unlabeled numbers with no formatting. We should write a print.loop function that gives output in the style of, say, estimatr, or even lm, with labeled estimate, standard error, hypothesis tests, confidence intervals
When the outcome is binary randomforest throws a warning that you're using regression instead of classification. Can we turn this warning off? It will confuse people.

adamSales commented 1 year ago

Oh yeah, I also think there should be optional output like, say, R^2 from the random forests or whatever. we could go to town

adamSales commented 1 year ago

Formatting the "Pair" vector for p_loop:

Background: I am using p_loop() to estimate effects from a pair-matching design from an observational study, using the pairmatch() function from the optmatch package. That function returns a vector of pair IDs as a factor, where unmatched units have <NA> for their ID. e.g.:

> head(match)
   1    2    3    4    5    6 
<NA>  1.1 <NA> <NA> <NA> <NA>

Plugging this into p_loop() led to two problems, one small and one serious. The small problem was that the P argument in p_loop() only takes numeric vectors. So:

> p_loop(Y=dat$y,Tr=dat$z,P=match,Z=cbind(prog))
Error in Z1 - Z2 : non-numeric argument to binary operator

I would think it would be fairly easy to allow for factors or character vectors, too. If not, the documentation should specify that P must be numeric.

More serious: The final output from p_loop() uses the full vectors of Y and Tr to estimate effects, including those cases with missing P, and throws an uninterpretable warning:

> p_loop(Y=dat$y,Tr=dat$z,P=as.numeric(match),Z=cbind(prog))
[1] 4.447821 2.663642
Warning message:
In Y[Tr == 1] - Y[Tr == 0] :
  longer object length is not a multiple of shorter object length

The problem comes in this line of p_loop:

tauhat = mean(Y[Tr == 1] - Y[Tr == 0]) - mean((2 * assigned$Tr - 1) * d)

If I exclude the unmatched cases at the outset I get the right answer (with no warning):

> p_loop(Y=dat$y[!is.na(match)],Tr=dat$z[!is.na(match)],P=as.numeric(match[!is.na(match)]),Z=cbind(prog[!is.na(match)]))
[1] 3.675760 2.717145

IMO the best solution would be to change the tauhat = line above to something like:

tauhat = with(assigned, mean((2*Tr - 1)*(Y1-Y2-d)))

since it uses assigned for the whole thing. An alternative would be to throw an error if there's an NA in P, or at least an informative warning.

adamSales / rebarLoop

feedback on loop package #3