benjann / kmatch

Multivariate-distance and propensity-score matching, including entropy balancing, inverse probability weighting, (coarsened) exact matching, and regression adjustment
MIT License
8 stars 3 forks source link

Caliper on a specific variable #2

Open thomartinez opened 11 months ago

thomartinez commented 11 months ago

Hi,

Thank you for the package. I saw in the help file that it is possible to exactly match on some variables and to put a caliper on the distance measure, but is there a way to put a caliper on a specific variable? For example, for each treated, I want to keep only the controls that have an age difference of 5 years maximum.

Best regards,

Thomas

benjann commented 11 months ago

Hi, if you only have one such variable, then you can do this using metric(euclidean). Here is an example using nearest-neighbor matching:

webuse cattaneo2, clear
kmatch md mbsmoke fage, ematch(prenatal1 mmarried fbaby) att ///
    metric(euclidean) caliper(3) ///
    nn(5) idgenerate
gen dmax = -1
foreach id of var _ID_* {
    qui replace dmax = max(dmax, abs(fage-fage[`id'])) if mbsmoke==1 & `id'<.
}
qui replace dmax = . if dmax==-1
tab dmax

       dmax |      Freq.     Percent        Cum.
------------+-----------------------------------
          0 |        772       90.82       90.82
          1 |         67        7.88       98.71
          2 |         11        1.29      100.00
------------+-----------------------------------
      Total |        850      100.00

metric(euclidean) uses the identity matrix as scaling metric (i.e. do not change the scaling); the "distance" on which the matching is based will then just be the age difference in the original scaling of the variable; caliper(3) then means that the maximum allowed age distance is 3 years. In the example I use idgenerate() to store the observation numbers of the matched controls, which I then use to compute the maximum age difference between treated and matched controls. We see that in 772 cases there is a max age difference of 0, in 67 cases a max dif of 1, in 11 cases a max dif of 2. That there are no cases with a max dif of 3 seems to be precision issue; according to the help file, caliper() should allow max differences of 3, but it seems that these comparisons have been excluded nonetheless. You can prevent this by setting the caliper to a slightly larger value.

webuse cattaneo2, clear
kmatch md mbsmoke fage, ematch(prenatal1 mmarried fbaby) att ///
    metric(euclidean) caliper(3.0001) ///
    nn(5) idgenerate
gen dmax = -1
foreach id of var _ID_* {
    qui replace dmax = max(dmax, abs(fage-fage[`id'])) if mbsmoke==1 & `id'<.
}
qui replace dmax = . if dmax==-1
tab dmax

       dmax |      Freq.     Percent        Cum.
------------+-----------------------------------
          0 |        772       90.61       90.61
          1 |         67        7.86       98.47
          2 |         11        1.29       99.77
          3 |          2        0.23      100.00
------------+-----------------------------------
      Total |        852      100.00

However, kmatch does not support the specification of a maximum distance for each variable separately. Furthermore, as soon as you use multiple variables, the distance to which the caliper is applied is a composite distance across all variables.

thomartinez commented 11 months ago

Hi,

Thank you very much for your detailed reply. That's very useful!

Based on your last paragraph, I suppose though that it is not applicable to kernel matching? (since caliper cannot apply to multiple variables, and caliper is a synonym for bwidth in the package)

Ideally, I would have liked to compare the results between nearest neighbor matching and kernel matching, both using a caliper on age.

Anyway, thanks a lot!

Thomas

benjann commented 11 months ago

Yeah, I believe this is currently not possible. An extra option would be needed that lets you specify for each variable the max difference that you want to allow.