MSGFPlus / msgfplus

MS-GF+ (aka MSGF+ or MSGFPlus) performs peptide identification by scoring MS/MS spectra against peptides derived from a protein sequence database.
Other
77 stars 36 forks source link

MSGF+ always run using at most 400% CPUs. #52

Open BioWu opened 6 years ago

BioWu commented 6 years ago

I run MSGF+ (MS-GF+ Release (v2017.01.13) (13 Jan 2017)) in OpenMS with 20 threads, while it only use CPUs no more than 450%. The sever I used here is 20 CPUs and 64Gb RAMs. This makes me confused whether some paras are wrong in my steps? The command line is:

 java -Xmx50000m -jar ./MGSF_Plus/MSGFPlus.jar -s Merger_Dounce.mzML -o /tmp/20181112_100816_debug01_16693_1/msgfp
lus_output.mzid -d ./database/Mixed.decoy.fasta -t 10ppm -ti 0,1 -tda 0 -m 3 -inst 3 -e 1 -protocol 0 -ntt 2 -minLen
gth 7 -maxLength 30 -minCharge 2 -maxCharge 4 -n 1 -addFeatures 1 -thread 20 -mod /tmp/20181112_100816_debug01_16693_1/msgfplus_mods.txt
alchemistmatt commented 5 years ago

Be sure to check the logs for status outputs. You can try using -thread 30 to see if the CPU usage increases. I believe, however, that it limits the number of threads that it will used based on the number of spectra that it needs to process. What we do, instead of increasing -thread to a higher number, we run two or even three copies of MS-GF+ on the same system, each with -thread 8 or -thread 12. Can you run two instances of OpenMS at the same time, processing different sets of data in each instance?

FarmGeek4Life commented 5 years ago

Yes, MS-GF+ does limit the number of threads based on the number of spectra it needs to process. The initial console output before it runs the search does include the specified number of threads, and also will output if it reduces the number of threads it will use due to a low number of spectra. The reason why it will reduce the number of threads is because there is a set cost in processing time for each thread that it creates, and if the number of spectra per thread is too low processing takes longer than it would with a lower number of threads. We have also seen MS-GF+ performance issues before when running on NUMA systems (primarily multi-socket systems, but also applies to certain AMD processors), when the number of threads we give to MS-GF+ is larger than the number of threads in a single NUMA node.

The MS-GF+ console output and server CPU model would tell us more and allow us to verify that the above is the cause of your issue.