byu-dml / metalearn

BYU's python library of useable tools for metalearning
MIT License
22 stars 6 forks source link

Index out-of-bounds error in run_pipeline #163

Closed rlaboulaye closed 5 years ago

rlaboulaye commented 5 years ago

Occasionally, the run_pipeline function encounters a class index out-of-bounds error when trying to run an sklearn pipeline. I suspect that this happens when a certain training data fold is missing one or more of the possible class values.

In order to try to reproduce this issue, I would suggest creating a synthetic dataset with only one instance of a certain class value and seeing if it causes the function to fail.

rlaboulaye commented 5 years ago

My previous thoughts on the cause of this error were wrong. The bug was caused by running linear discriminant analysis in an sklearn pipeline with a target column that possesses only one value. Only the lsqr variant of lda causes this error. We will switch to using the default variant of lda, which uses an svd solver.