Closed oplatek closed 2 years ago
For another of my embeddings I obtain the following error numpy.linalg.LinAlgError: The leading minor of order 4 of B is not positive definite. The factorization of B could not be completed and no eigenvalues or eigenvectors were computed.
plda_classifier.fit_model(embeddings, labels, n_principal_components=pca_first_n)
File "/lnet/work/people/oplatek/plda/plda/classifier.py", line 25, in fit_model
self.model = Model(X, Y, n_principal_components)
File "/lnet/work/people/oplatek/plda/plda/model.py", line 96, in __init__
self.fit(row_wise_data, labels, n_principal_components)
File "/lnet/work/people/oplatek/plda/plda/model.py", line 178, in fit
optimize_maximum_likelihood(X, labels)
File "/lnet/work/people/oplatek/plda/plda/optimizer.py", line 68, in optimize_maximum_likelihood
W = calc_W(S_b, S_w)
File "/lnet/work/people/oplatek/plda/plda/optimizer.py", line 180, in calc_W
eigenvalues, eigenvectors = eigh(S_b, S_w)
File "/lnet/work/people/oplatek/moosenet03/env/lib/python3.8/site-packages/scipy/linalg/decomp.py", line 578, in eigh
raise LinAlgError('The leading minor of order {} of B is not '
numpy.linalg.LinAlgError: The leading minor of order 4 of B is not positive definite. The factorization of B could not be completed and no eigenvalues or eigenvectors were computed.
UPDATE: This error was caused because of a bug where the number of n_principal_components
insanely high.
Closing the issue.
Describe the bug I want to summarize problems and lessons learned while using this PLDA implementation.
I observed two kinds of errors when training so far:
ValueError: array must not contain infs or NaNs
in optimizer.py:181 runningcalc_W(S_b, S_w)
LinAlgError: SVD did not converge
in model.py:166 runningmatrix_rank = np.linalg.matrix_rank(S_w)
Both boils down to covariance matrix computation.
To Reproduce Steps to reproduce the behavior: Use a dummy data with single example per class
What helped?
So far I created a dummy data using gaussian noise and ensured that we have two examples per label.
I am not 100% what is the root cause but I suspect that that computing covariance matrix from a single vector when computing within class scatter is the problematic part.
I will follow up and probably create a PR with asserts for the input data in order to avoid the error.
Note: There are several closed issues #56 #57 but I am opening this one for a new discussion and keep my (next) findings at single place. Feel free to close it if you do not like it.