compomics / searchgui

Highly adaptable common interface for proteomics search and de novo engines
http://compomics.github.io/projects/searchgui.html
42 stars 15 forks source link

SearchGUI parallelisation #294

Closed bernt-matthias closed 3 years ago

bernt-matthias commented 3 years ago

I have a huge data set (53 samples and >3GB sequence DB) which needs a lot of time.

So I was keen to try the -threads parameter. It appeared to me that at least reading the inputs is not paralellised which needs >7h of time. I guess our admins would kill such a job because it would waste the additional cores for a long time.

So my question is how does SG paralellize? Is it running several searches in parallel?

hbarsnes commented 3 years ago

Regarding "reading the inputs", do you mean the reading and conversion of the spectrum file input that happens in SearchGUI itself, or the processing of the input in the individual search engines? The former is indeed currently not multithreaded, while the latter is outside our control. We could consider parallelization for our internal processing, but it's not something we have the resources to look into at the moment.

As for the threads parameter, it is simply forwarded to the individual search engines (when they have such an option). It is then up to the search engine implementation how the availability of multiple threads is utilized. The searches themselves are still run one at the time. Not sure what gains, if any, there would be in running more than one search engine at the same time, as that would basically spread the same resources across multiple search engines? There might be parts of the searches where the provided resources are not fully taken advantage of, but figuring out when that occurs and trying to balance that sounds very complicated.

bernt-matthias commented 3 years ago

Thanks for the fast reply:

Regarding "reading the inputs", do you mean the reading and conversion of the spectrum file input that happens in SearchGUI itself, or the processing of the input in the individual search engines?

I guess its the reading in SearchGUI

Wed Mar 24 08:21:32 CET 2021 Importing spectrum files.
...
Wed Mar 24 14:53:47 CET 2021 Importing spectrum files completed (6 hours 32 minutes 14.0 seconds).

There might be parts of the searches where the provided resources are not fully taken advantage of, but figuring out when that occurs and trying to balance that sounds very complicated.

Indeed. Thanks for the answer.

hbarsnes commented 3 years ago

I guess its the reading in SearchGUI

The good news is that you will only have to do this once if you keep the indexes, i.e. the cms files with the same name as the spectrum files. But we'll look into whether this can be sped up as soon as we get the time.

hbarsnes commented 3 years ago

In SearchGUI v4.0.28 we have greatly increased the speed of the importing/indexing of the spectrum files, basically by replacing the compression library. Please give it a go. If it does improve the speed, please let us know and we'll reopen the issue.