PyProphet / pyprophet

PyProphet: Semi-supervised learning and scoring of OpenSWATH results.
http://www.openswath.org
BSD 3-Clause "New" or "Revised" License
29 stars 21 forks source link

Change random to np.random #8

Closed bretttully closed 6 years ago

bretttully commented 7 years ago

In order to ensure that setting the random seed filters through to sklearn, we need to make sure that np.random is used everywhere. This now means that if you run pyprophet on the same input file, and set the random seed, you will get the same output. Before this change, that was not the case.

uweschmitt commented 7 years ago

Thanks for your countribution !

What part of scikit-learn we are using is randomized ? AFAIK we only use LDA which is deterministic and should always provide the same result for fixed input data.

bretttully commented 7 years ago

Hi @uweschmitt -- that is likely my mistake then. I was running OpenSWATH+PyProphet multiple times on the same input file and continually getting slightly different answers even when setting the random seed. I assumed there must be some stochastic element to the PyProphet implementation, and had assumed (probably incorrectly) that this was an interaction between np.random and random through sklearn.

One of the other sources of randomness I found was the ordering of the results from OpenSWATH. If you run this in threaded mode, the output has the same content from run to run, but is in a different order (I assume there are race conditions between the threads). So you have to sort the OpenSWATH file before you pass it in to PyProphet as well. Perhaps this last step is sufficient and this pull request isn't needed?

uweschmitt commented 7 years ago

@bretttully But your fix removed the randomness ? If yes we have to investigate this further.

@hroest @grosenberger can you comment on the ordering issue please ?

bretttully commented 7 years ago

The combination of the two fixes did remove the inconsistent results; I didn't test the fixes independently -- I can look in more detail next week.

grosenberger commented 6 years ago

Yes, if OpenSWATH is run with multiple threads, the results are ordered differently. @bretttully did you have a chance to check whether the patch is needed when PyProphet is applied to the same OpenSWATH report twice in debug mode?

grosenberger commented 6 years ago

@bretttully Did you have a chance to look again into this issue? Could you resolve the problem without the fix or is still required?

bretttully commented 6 years ago

From what I can tell, so long as the results are sorted before running pyprophet, its output is consistent without this change.

grosenberger commented 6 years ago

Thanks for the information. I will then close the PR for the moment.

bretttully commented 6 years ago

@grosenberger with the new sqlite format, will that be inserted in a consistently sorted order? Or will it also suffer from the race condition of different threads?

grosenberger commented 6 years ago

The data is still inserted in a random order by OpenSWATH, but I will implement sorted querying by PyProphet.