MSGFPlus / msgfplus

MS-GF+ (aka MSGF+ or MSGFPlus) performs peptide identification by scoring MS/MS spectra against peptides derived from a protein sequence database.
Other
77 stars 36 forks source link

Unexpected low sensitivity for asp-n digested samples #137

Open JB91451 opened 2 years ago

JB91451 commented 2 years ago

Describe the question or problem Is there anything known about sensitivity issues for HCD / Asp-N workflows?

Details Dear MSGF+ developers,

I am currently analysing a batch of samples, either digested with lys-c or asp-n. All samples were measured on a QExactive and are searched against a six-frame genome-translation derived database containing peptides generated by the corresponding enzyme. As the sample files are searched with Comet, MS-Fragger and MSGF+, the post-processing involves a peptideProphet and iProphet pipeline and thus the conversion of mzident to pepXML (using CLevel=2).

However, while for lys-c there is consistently between 10 and 15% more identified spectra at 0.1% FDR for MSGF+ compared to comet (MS-Fragger searches did not yet run but the range should be the same), there is an extreme drop in MSGF+'s sensitivity when it comes to the asp-n digested samples: ~3000 vs. 700 identified spectra; 15000 vs. 4200; 12000 vs. 2600. The samples are different fractions, not replicates, so the difference between them is expected.

The only differences in the parameter files between asp-n and lys-c searches are the fasta file and the enzyme selection. I did not choose no-cleavage in order to keep the number of missed cleavage sites.

In the 2014 publication I saw that the HCD model for a standard workflow was trained for tryptic peptides using the Freeze-2011 dataset (blue line in figure 1), while the non-tryptic peptides were trained directly on CID and ETD data (red lines in figure 1) only. Could this be the reason?

Best regards, Juergen

Useful extras

alchemistmatt commented 2 years ago

This is an interesting observation, and I agree with your theory that the training data is likely the source of the differences in identification rates. MS-GF+ is not under active development, so you'll just have to work with the results that it produces for your Asp-N searches. This just goes to show that: a) MS/MS peptide identification is not easy (thus a plethora of identification tool options) b) Different MS/MS identification tools have their strengths and weaknesses

sangtaekim commented 2 years ago

MS-GF+ includes two parameter files for AspN, both trained from iontrap data ("Low-res"). A quick fix is is to use "InstrumentID=0" to force MS-GF+ to use the AspN param set. If you have enough spectra (e.g. >50K), a better solution is to run a search with "InstrumentID=0" and create a new param set using https://msgfplus.github.io/msgfplus/ScoringParamGen.html.

JB91451 commented 2 years ago

Thank you both for your answers. I will try to generate a new param set. Doeos it matter for this purpose whether I use the very same files, that I want to analyse? Or should I look for some unrelated projects, e.g. from PRIDE?

sangtaekim commented 2 years ago

@JB91451 It will be fine to use the same files.