Set tolerance to zero in pivoted Cholesky

andreasnoack commented 1 year ago

Right now, we rely on a default that is more likely to calculate a smaller rank. When setting it to zero, the rank will only be reduced when the matrix is exactly singular or slightly indefinite which I think is the beneficial behavior. Otherwise, we'll drop predictors too frequently, see https://github.com/JuliaStats/GLM.jl/pull/507#discussion_r1145045405

andreasnoack commented 1 year ago

(for now this is just for testing the effect)

bkamins commented 1 year ago

As I have commented - it would be good to test against false negatives (i.e. not dropping columns in case of multi-collinearity). If needed I can propose such tests.

andreasnoack commented 1 year ago

I've now spent some time looking into the test failures. While tol=0 in the Cholesky would allow us to fit a lot of models that currently result in dropped columns, it is also not really compatible with the automatic variable selection approach that we currently have. I'd probably be in favor of tol=0 which would let the confidence bounds suggest which columns to drop at the cost of the automatic behavior but it would be fairly breaking and I might be a minority here. So maybe the best we can do is to document that the :qr option might not drop as many variables as :cholesky.

JuliaStats / GLM.jl

Set tolerance to zero in pivoted Cholesky #518