RaviSoji / plda

Probabilistic Linear Discriminant Analysis & classification, written in Python.
https://ravisoji.com
Apache License 2.0
127 stars 31 forks source link

Training converging problems #63

Closed oplatek closed 2 years ago

oplatek commented 2 years ago

Describe the bug I want to summarize problems and lessons learned while using this PLDA implementation.

I observed two kinds of errors when training so far:

  1. ValueError: array must not contain infs or NaNs in optimizer.py:181 running calc_W(S_b, S_w)
  2. LinAlgError: SVD did not converge in model.py:166 running matrix_rank = np.linalg.matrix_rank(S_w)

Both boils down to covariance matrix computation.

To Reproduce Steps to reproduce the behavior: Use a dummy data with single example per class

ipdb> labels.shape                                                                                                                                                                                                                                                                                                                                                                    [70/6138]
(5,)
ipdb> labels
array([1, 4, 3, 0, 2])
ipdb> embeddings.shape
(5, 896)

ipdb> plda_classifier = plda.Classifier()
ipdb> plda_classifier.fit_model(embeddings, labels)
/lnet/work/people/oplatek/plda/plda/optimizer.py:165: RuntimeWarning: Degrees of freedom <= 0 for slice
  cov_ks.append(np.cov(X_k.T))
/lnet/work/people/oplatek/moosenet03/env/lib/python3.8/site-packages/numpy/lib/function_base.py:2680: RuntimeWarning: divide by zero encountered in true_divide
  c *= np.true_divide(1, fact)
/lnet/work/people/oplatek/moosenet03/env/lib/python3.8/site-packages/numpy/lib/function_base.py:2680: RuntimeWarning: invalid value encountered in multiply
  c *= np.true_divide(1, fact)
*** numpy.linalg.LinAlgError: SVD did not converge

What helped?

So far I created a dummy data using gaussian noise and ensured that we have two examples per label.

embeddings = np.concatenate((embeddings, embeddings + randn(*embeddings.shape)), axis=0)
labels = np.concatenate((labels, labels), exis=0)
plda_classifier.fit_model(embeddings, labels)  # runs without errors

I am not 100% what is the root cause but I suspect that that computing covariance matrix from a single vector when computing within class scatter is the problematic part.

I will follow up and probably create a PR with asserts for the input data in order to avoid the error.

Note: There are several closed issues #56 #57 but I am opening this one for a new discussion and keep my (next) findings at single place. Feel free to close it if you do not like it.

oplatek commented 2 years ago

For another of my embeddings I obtain the following error numpy.linalg.LinAlgError: The leading minor of order 4 of B is not positive definite. The factorization of B could not be completed and no eigenvalues or eigenvectors were computed.

    plda_classifier.fit_model(embeddings, labels, n_principal_components=pca_first_n)
  File "/lnet/work/people/oplatek/plda/plda/classifier.py", line 25, in fit_model
    self.model = Model(X, Y, n_principal_components)
  File "/lnet/work/people/oplatek/plda/plda/model.py", line 96, in __init__
    self.fit(row_wise_data, labels, n_principal_components)
  File "/lnet/work/people/oplatek/plda/plda/model.py", line 178, in fit
    optimize_maximum_likelihood(X, labels)
  File "/lnet/work/people/oplatek/plda/plda/optimizer.py", line 68, in optimize_maximum_likelihood
    W = calc_W(S_b, S_w)
  File "/lnet/work/people/oplatek/plda/plda/optimizer.py", line 180, in calc_W
    eigenvalues, eigenvectors = eigh(S_b, S_w)
  File "/lnet/work/people/oplatek/moosenet03/env/lib/python3.8/site-packages/scipy/linalg/decomp.py", line 578, in eigh
    raise LinAlgError('The leading minor of order {} of B is not '
numpy.linalg.LinAlgError: The leading minor of order 4 of B is not positive definite. The factorization of B could not be completed and no eigenvalues or eigenvectors were computed.

UPDATE: This error was caused because of a bug where the number of n_principal_components insanely high.

Closing the issue.