atahk / pscl

Political Science Computational Laboratory
65 stars 14 forks source link

Differences between countreg and pscl #18

Open MichaelRatajczak92 opened 1 year ago

MichaelRatajczak92 commented 1 year ago

Hi @simonjackman @psolymos @atahk @sjessee,

I am not sure if you are aware of this, but differences in the estimates of identical models ran using counteg and pscl have been reported on stack overflow (https://stackoverflow.com/questions/76003780/zeroinfl-from-the-pscl-and-countreg-packages-give-very-different-results).

I also found marginal differences in the estimated z-values of the ZI component, but to a much smaller extent than the stack exchange user, above. countreg estimates are consistent with those produced using glmmTMB (in my case). Do you know why this may be happening?

Kind regards, Michael

psolymos commented 1 year ago

I checked the claims and here are my results:

r$> coef(summary(mdvisit_zeroinf))$count
               Estimate Std. Error     z value     Pr(>|z|)
(Intercept)  0.02850965 0.54816608  0.05200915 9.585214e-01
reform      -0.11947413 0.05207050 -2.29446851 2.176360e-02
badh         1.10862517 0.07448640 14.88359268 4.212574e-50
agegrp       0.07981368 0.03509869  2.27397876 2.296726e-02
educ        -0.05928792 0.03882254 -1.52715218 1.267232e-01
loginc       0.10102130 0.07155307  1.41183725 1.579979e-01
Log(theta)   0.04502887 0.05405013  0.83309460 4.047914e-01

r$> coef(summary(mdvisit_zeroinf2))$count
               Estimate Std. Error     z value     Pr(>|z|)
(Intercept)  0.02467049 0.54759004  0.04505285 9.640652e-01
reform      -0.11942382 0.05207417 -2.29334098 2.182838e-02
badh         1.10899274 0.07439003 14.90781504 2.931907e-50
agegrp       0.07997016 0.03506222  2.28080711 2.255986e-02
educ        -0.05924445 0.03872532 -1.52986348 1.260505e-01
loginc       0.10146349 0.07150424  1.41898563 1.559032e-01
Log(theta)   0.04527069 0.05356975  0.84507933 3.980665e-01

r$> coef(summary(mdvisit_zeroinf))$zero
              Estimate Std. Error    z value  Pr(>|z|)
(Intercept) 14.2700590 28.5479926  0.4998621 0.6171722
reform       0.9768844  0.9862036  0.9905504 0.3219052
badh        -2.6259601  5.6297440 -0.4664440 0.6408977
agegrp      -0.1122822  0.5029740 -0.2232366 0.8233514
educ        -6.7804938 27.3055267 -0.2483195 0.8038872
loginc      -1.3499846  0.9485839 -1.4231578 0.1546904

r$> coef(summary(mdvisit_zeroinf2))$zero
              Estimate Std. Error    z value  Pr(>|z|)
(Intercept) 15.2711242 47.0408378  0.3246355 0.7454570
reform       0.9808092  0.9807572  1.0000530 0.3172849
badh        -2.5021640  4.6391946 -0.5393531 0.5896432
agegrp      -0.1064381  0.4940483 -0.2154408 0.8294237
educ        -7.7945288 46.2971628 -0.1683587 0.8663011
loginc      -1.3500402  0.9444885 -1.4293877 0.1528928

Pretty small differences except for zero intercept and educ effect (estimate & SE) but those are pretty extreme values on logit scale, so those won't matter much on the probability scale IMO:

r$> summary(coef(summary(mdvisit_zeroinf))$count-coef(summary(mdvisit_zeroinf2))$count)
    Estimate            Std. Error            z value              Pr(>|z|)
 Min.   :-4.422e-04   Min.   :-3.667e-06   Min.   :-0.0242224   Min.   :-5.544e-03
 1st Qu.:-3.047e-04   1st Qu.: 4.265e-05   1st Qu.:-0.0095666   1st Qu.:-3.239e-05
 Median :-1.565e-04   Median : 9.637e-05   Median :-0.0068283   Median : 4.074e-04
 Mean   : 3.625e-04   Mean   : 1.902e-04   Mean   :-0.0059491   Mean   : 6.130e-04
 3rd Qu.:-4.689e-05   3rd Qu.: 2.888e-04   3rd Qu.: 0.0007919   3rd Qu.: 1.384e-03
 Max.   : 3.839e-03   Max.   : 5.760e-04   Max.   : 0.0069563   Max.   : 6.725e-03

r$> summary(coef(summary(mdvisit_zeroinf))$zero-coef(summary(mdvisit_zeroinf2))$zero)
    Estimate            Std. Error            z value             Pr(>|z|)
 Min.   :-1.0010652   Min.   :-18.991636   Min.   :-0.079961   Min.   :-0.128285
 1st Qu.:-0.0943081   1st Qu.:-13.868610   1st Qu.:-0.009076   1st Qu.:-0.048328
 Median :-0.0048844   Median :  0.004771   Median :-0.000783   Median :-0.002137
 Mean   :-0.0200899   Mean   : -6.079244   Mean   : 0.026184   Mean   :-0.023183
 3rd Qu.:-0.0009395   3rd Qu.:  0.008056   3rd Qu.: 0.056239   3rd Qu.: 0.003915
 Max.   : 1.0140350   Max.   :  0.990549   Max.   : 0.175227   Max.   : 0.051255
MichaelRatajczak92 commented 1 year ago

I agree that the differences are small, and not really meaningful (if you focus on the magnitudes of the reported effects). For statisticians and competent data analysts, this is not really that much of a problem. But I guess in the context of the NHST framework, there will be some people who conduct data analyses that may select one of these packages based on whether p < 0.05, in one of the packages vs. the other, for their estimate of interest. IMO it would be good if the estimates were identical, because, as it is now, there is another degree of freedom for some analysts to consider.