Closed rlaboulaye closed 5 years ago
My previous thoughts on the cause of this error were wrong. The bug was caused by running linear discriminant analysis in an sklearn pipeline with a target column that possesses only one value. Only the lsqr variant of lda causes this error. We will switch to using the default variant of lda, which uses an svd solver.
Occasionally, the run_pipeline function encounters a class index out-of-bounds error when trying to run an sklearn pipeline. I suspect that this happens when a certain training data fold is missing one or more of the possible class values.
In order to try to reproduce this issue, I would suggest creating a synthetic dataset with only one instance of a certain class value and seeing if it causes the function to fail.