PyProphet / pyprophet

PyProphet: Semi-supervised learning and scoring of OpenSWATH results.
http://www.openswath.org
BSD 3-Clause "New" or "Revised" License
29 stars 21 forks source link

SVD does not converge #54

Closed Leon-Bichmann closed 5 years ago

Leon-Bichmann commented 5 years ago

Hi,

I'm getting a couple of problems when trying to run pyprophet on my data:

My target - decoy distribution is quite uneven, which is maybe the source of the problem:

Info: Data set contains 262 decoy and 2601 target groups.

When running pyprophet I often run into a SVD convergence problem:

File "/anaconda2/lib/python2.7/site-packages/sklearn/discriminant_analysis.py", line 384, in _solve_svd
    U, S, V = linalg.svd(X, full_matrices=False)
  File "/anaconda2/lib/python2.7/site-packages/scipy/linalg/decomp_svd.py", line 132, in svd
    raise LinAlgError("SVD did not converge")
numpy.linalg.linalg.LinAlgError: SVD did not converge

or a segmentation fault:

Info: Start learning on 10 folds using 1 processes.
Info: Learning on cross-validation fold.
Info: Learning on cross-validation fold.
Info: Learning on cross-validation fold.
Info: Learning on cross-validation fold.
Segmentation fault: 11

However after trying a couple of times it also works sometimes..

grosenberger commented 5 years ago

I have never seen this issue before. Can you reproduce a minimal example? With so few decoys, there is probably an issue, e.g. too many identical data points, in one of the folds.

Leon-Bichmann commented 5 years ago

I see - so you mean I should reduce the number of targets in the library to a similar range and test it again?

What could be the reason for so few decoy hits?

grosenberger commented 5 years ago

What kind of data are you using? Is this DIA, SRM or DDA? Are the scores generated by OpenSWATH or a different tool?

How big is the library?

Leon-Bichmann commented 5 years ago

I am analysing DIA Data generated with the OpenSWATH tools in OpenMS.

The library contains 6800 transitions with 50% decoys, each with min and max 6 transitions.

uweschmitt commented 5 years ago

Probably you have nan or some inf etc values in the matrix which pyprophet passes to the LDA implementation from scikit-learn.

You might use a debugger and set a breakpoint at https://github.com/PyProphet/pyprophet/blob/master/pyprophet/classifiers.py#L70 to see what features and labels you pass to the classifier.

Leon-Bichmann commented 5 years ago

Ok, running it inside the pyprophet/master docker container worked fine. Maybe there is a problem with my local installation.