PNNL-Comp-Mass-Spec / Informed-Proteomics

Top down / bottom up, MS/MS analysis tool for DDA and DIA mass spectrometry data
29 stars 9 forks source link

Why am I running MSPathFinder slow? #16

Closed sunyusui closed 5 years ago

sunyusui commented 5 years ago

I read the paper "Informed-Proteomics: open-source software package for top-down proteomics", which mentions that MSPathFinder is a faster proteoform identification tool. But I don't know where I am operating, which makes the identification speed slower. First, I used the msconvert tool in the ProteoWizard package to convert the original spectral file to the .mzML file format. Then experimented with the following parameters, MSPathFinderT.exe running total time is 8d8h12m. SpecFile 2DLC_H3_1.pbf DatabaseFile human_proteome_database.fasta FeatureFile 2DLC_H3_1.ms1ft InternalCleavageMode SingleInternalCleavage Tag-based search True Tda Target+Decoy PrecursorIonTolerancePpm 10 ProductIonTolerancePpm 10 MinSequenceLength 21 MaxSequenceLength 300 MinPrecursorIonCharge 2 MaxPrecursorIonCharge 30 MinProductIonCharge 1 MaxProductIonCharge 20 MinSequenceMass 3000 MaxSequenceMass 50000 ActivationMethod Unknown MaxDynamicModificationsPerSequence 0

When I used the default parameters below and added a modified file, the experiment speed became slower. Running 0d 22h 35.02m only ran 0.4%. I want to know  where the problem is?

SpecFile 2DLC_H3_1.pbf DatabaseFile human_proteome_database.fasta FeatureFile 2DLC_H3_1.ms1ft InternalCleavageMode SingleInternalCleavage Tag-based search True Tda Target+Decoy PrecursorIonTolerancePpm 10 ProductIonTolerancePpm 10 MinSequenceLength 21 MaxSequenceLength 500 MinPrecursorIonCharge 2 MaxPrecursorIonCharge 50 MinProductIonCharge 1 MaxProductIonCharge 20 MinSequenceMass 3000 MaxSequenceMass 50000 ActivationMethod Unknown MaxDynamicModificationsPerSequence 4 Modification C(2) H(2) N(0) O(1) S(0),R,opt,Everywhere,Acetyl Modification C(2) H(2) N(0) O(1) S(0),K,opt,Everywhere,Acetyl Modification C(1) H(2) N(0) O(0) S(0),R,opt,Everywhere,Methyl Modification C(1) H(2) N(0) O(0) S(0),K,opt,Everywhere,Methyl Modification C(2) H(4) N(0) O(0) S(0),R,opt,Everywhere,Dimethyl Modification C(2) H(4) N(0) O(0) S(0),K,opt,Everywhere,Dimethyl Modification C(3) H(6) N(0) O(0) S(0),R,opt,Everywhere,Trimethyl Modification C(0) H(1) N(0) O(3) S(0) P(1),S,opt,Everywhere,Phospho Modification C(0) H(1) N(0) O(3) S(0) P(1),T,opt,Everywhere,Phospho Modification C(0) H(1) N(0) O(3) S(0) P(1),Y,opt,Everywhere,Phospho

Processing, 93499 proteins done, 0.4% complete, 80266.5 sec elapsed Total Progress: 42.58%, 0d 22h 20.02m elapsed, Current Task: Searching the targe t database Processing, 93950 proteins done, 0.4% complete, 80566.8 sec elapsed Total Progress: 42.58%, 0d 22h 25.02m elapsed, Current Task: Searching the targe t database Processing, 94352 proteins done, 0.4% complete, 80881.6 sec elapsed Total Progress: 42.58%, 0d 22h 30.02m elapsed, Current Task: Searching the targe t database Processing, 94955 proteins done, 0.4% complete, 81189.5 sec elapsed Total Progress: 42.58%, 0d 22h 35.02m elapsed, Current Task: Searching the targe t database Another problem is that the. fasta file I used contains 20410 entries, why the search shows that 94352 proteins done?

FarmGeek4Life commented 5 years ago

What version of MSPathFinder are you running? How many spectra are in the input file? Release 1.0 (if I remember correctly) reports "proteins", but what was really being counted was peptides.

sunyusui commented 5 years ago

The input file contains 3460 spectra, and the version I am using is 1.0.6510.1956

alchemistmatt commented 5 years ago

Please use the latest release since it has numerous bug fixes compared to the preview release that you are currently using. See: https://github.com/PNNL-Comp-Mass-Spec/Informed-Proteomics/releases/tag/v1.0.6619

sunyusui commented 5 years ago

I am very grateful to the scholars for their help, I will download the latest version to complete my experiment.