RaviSoji / plda

Probabilistic Linear Discriminant Analysis & classification, written in Python.
https://ravisoji.com
Apache License 2.0
128 stars 31 forks source link

Error while training model #56

Closed 009deep closed 3 years ago

009deep commented 3 years ago

Hi @RaviSoji I am using this code as shown in mnist_demo nothebook. I am using my own data.

If I use classifier.fit_model(training_data, training_labels, n_principal_components=None) , I get following error:

error: failed in converting 2nd argument `b' of _flapack.dsygvd to C/Fortran array in scipy/linalg/decomp.py in eigh

And if I use some value for n_principal_components such as classifier.fit_model(training_data, training_labels, n_principal_components=5), I get following error:

ValueError: array must not contain infs or NaNs in numpy/lib/function_base.py in asarray_chkfinite

I tried by freshly installing environment but errors are same. Any suggestion what is causing error?

Thanks.

RaviSoji commented 3 years ago

Sorry I won't have time to look into this any time soon, but here are two easy diagnostics:

  1. Run the MNIST demo. This should run because I tested it myself, but if it doesn't run, let me know, and I will try to see if anything changed. If this runs, then it's probably not the software.
  2. To your training_data, add a small amount of Gaussian noise with mean 0 and standard deviation .05 * np.std(column_j)) to every column j in your training_data, independently. Then, fit the model. This should also run if your data is formatted the same way the MNIST data is formatted in the example.

If (1) runs, but not (2), my first guess would be that something is probably not right with the data formatting. If both (1) and (2) run, you probably have colinearity/multicolineairty-like issues within or between categories.

Edit for future users. See 009deep's response below: the issue was that there were too few observations per class relative to the number of classes.

009deep commented 3 years ago

Thanks for quick response. I analyzed and error is attributed to disproportion size of classes to sample/classes. I have total classes in 1000s and sample/class in 10s. This error can also be simulated by feeding only first 10 training samples in mnist_demo notebook. I basically affects value of Sw in eq 1.
I'll close it as error seems geared towards data and not code.

RaviSoji commented 3 years ago

Thanks a lot for figuring out the issue and sharing the lessons! I just realized my response to a previous issue could have been potentially useful to you: https://github.com/RaviSoji/plda/issues/49. I totally forgot about that -- sorry!

Ravi B. Sojitra