dlinzer / poLCA

Polytomous Variable Latent Class Analysis (R package)
https://dlinzer.github.io/poLCA/
48 stars 17 forks source link

Allow larger number of observed items without underflows #2

Closed JeffreyBLewis closed 7 years ago

JeffreyBLewis commented 8 years ago

These changes allow the number of items to be larger than 900 without the calculated class-conditional likelihoods underflowing. This is accomplished by rescaling those likelihoods by DOUBLE_XMAX. That rescaling then must be undone in calculations that require the unscaled likelihoods (probabilities). Ultimately, the number of items is bound in other ways, but this fix allows the number of items to be increased greatly beyond what is currently possible.

I verified that this new code produces identical output on the following example from the vignette:

library(poLCA)
data("gss82")
f <- cbind(PURPOSE, ACCURACY, UNDERSTA, COOPERAT) ~ 1
gss.lc <- poLCA(f, gss82, nclass = 3, maxiter = 3000, nrep = 10) 

I have not verified that functions provided in poLCA.predcell.R or poLCA.table.R work correctly after these changes, but they should as the rescaling is trivial in those cases.

Note that as the number of items grows large, certain class-conditional likelihoods/probabilities will still underflow (that is, be calculated to be 0). This is not a problem in general, but may cause problems in some instances. The estimation only requires that not all of the class-conditional likelihoods underflow (in which case the posterior class memberships evaluate to NaN and the estimation fails).

This rescaling approach will still eventually fail given a sufficiently large number of items. We could do more here to allow for an arbitrarily number items, but that would require a larger reengineering of the code and addressing other limitations external to the project (limitations for the number variables used in a formula, for example).

Jeff Lewis