Unexpected low sensitivity for asp-n digested samples

JB91451 commented 2 years ago

Describe the question or problem Is there anything known about sensitivity issues for HCD / Asp-N workflows?

Details Dear MSGF+ developers,

I am currently analysing a batch of samples, either digested with lys-c or asp-n. All samples were measured on a QExactive and are searched against a six-frame genome-translation derived database containing peptides generated by the corresponding enzyme. As the sample files are searched with Comet, MS-Fragger and MSGF+, the post-processing involves a peptideProphet and iProphet pipeline and thus the conversion of mzident to pepXML (using CLevel=2).

However, while for lys-c there is consistently between 10 and 15% more identified spectra at 0.1% FDR for MSGF+ compared to comet (MS-Fragger searches did not yet run but the range should be the same), there is an extreme drop in MSGF+'s sensitivity when it comes to the asp-n digested samples: ~3000 vs. 700 identified spectra; 15000 vs. 4200; 12000 vs. 2600. The samples are different fractions, not replicates, so the difference between them is expected.

The only differences in the parameter files between asp-n and lys-c searches are the fasta file and the enzyme selection. I did not choose no-cleavage in order to keep the number of missed cleavage sites.

In the 2014 publication I saw that the HCD model for a standard workflow was trained for tryptic peptides using the Freeze-2011 dataset (blue line in figure 1), while the non-tryptic peptides were trained directly on CID and ETD data (red lines in figure 1) only. Could this be the reason?

Best regards, Juergen

Useful extras

parameter files used to run MS-GF+ MSGFPlus_Params_QE1_AspN.txt MSGFPlus_Params_QE1_LysC.txt

alchemistmatt commented 2 years ago

This is an interesting observation, and I agree with your theory that the training data is likely the source of the differences in identification rates. MS-GF+ is not under active development, so you'll just have to work with the results that it produces for your Asp-N searches. This just goes to show that: a) MS/MS peptide identification is not easy (thus a plethora of identification tool options) b) Different MS/MS identification tools have their strengths and weaknesses

sangtaekim commented 2 years ago

MS-GF+ includes two parameter files for AspN, both trained from iontrap data ("Low-res"). A quick fix is is to use "InstrumentID=0" to force MS-GF+ to use the AspN param set. If you have enough spectra (e.g. >50K), a better solution is to run a search with "InstrumentID=0" and create a new param set using https://msgfplus.github.io/msgfplus/ScoringParamGen.html.

JB91451 commented 2 years ago

Thank you both for your answers. I will try to generate a new param set. Doeos it matter for this purpose whether I use the very same files, that I want to analyse? Or should I look for some unrelated projects, e.g. from PRIDE?

sangtaekim commented 2 years ago

@JB91451 It will be fine to use the same files.

MSGFPlus / msgfplus

Unexpected low sensitivity for asp-n digested samples #137