AaronGullickson / panethnicity_intermar

Data for "Patterns of Panethnic Intermarriage in the United States, 1980-2018" forthcoming in Demography
MIT License
0 stars 0 forks source link

Approximate estimation methods exploration #12

Closed AaronGullickson closed 3 years ago

AaronGullickson commented 3 years ago

It turns out there is an option in clogit to use an approximation technique that is much faster. I want to try this out and see if it produces similar results and how much faster it its:

From the help file:

The computation of the exact partial likelihood can be very slow, however. If a particular strata had say 10 events out of 20 subjects we have to add up a denominator that involves all possible ways of choosing 10 out of 20, which is 20!/(10! 10!) = 184756 terms. Gail et al describe a fast recursion method which partly ameliorates this; it was incorporated into version 2.36-11 of the survival package. The computation remains infeasible for very large groups of ties, say 100 ties out of 500 subjects, and may even lead to integer overflow for the subscripts – in this latter case the routine will refuse to undertake the task. The Efron approximation is normally a sufficiently accurate substitute.

Most of the time conditional logistic modeling is applied data with 1 case + k controls per set, in which case all of the approximations for ties lead to exactly the same result. The 'approximate' option maps to the Breslow approximation for the Cox model, for historical reasons.

AaronGullickson commented 3 years ago

Preliminary results on ACS data with expanded ethnicity groups:

Method time
Efron method 159 seconds
"Approximate" method 149 seconds
Exact method 855 seconds

So, yeah, the approximate methods are way faster than the exact method. Now I need to check whether the results are close.

AaronGullickson commented 3 years ago

Rplot

The results are basically identical. The correlation shows the same:

          [,1]      [,2]      [,3]
[1,] 1.0000000 0.9999999 0.9999998
[2,] 0.9999999 1.0000000 0.9999998
[3,] 0.9999998 0.9999998 1.0000000
AaronGullickson commented 3 years ago

Commit 5fb629cc279a1af7a11bdc2c6bb6b5df3eae4696 adds the script for the testing and commit 679c48ce21fdf4d08de18f06cf02d45281a35556 changes to the efron method for restricted models.