Closed lydiayliu closed 3 years ago
Thank you for clearly describing the issue. Yes, you are using the correct settings for peptidomics (EnzymeID=9
and NTT=2
)
I just ran some tests, and I got the same results for both of these:
EnzymeID=0
and NTT=2
EnzymeID=9
and NTT=2
You could try EnzymeID=0 and see if it makes a difference, but I doubt it will. In my tests, all of the results for EnzymeID=9
and NTT=2
were a full "peptide" match (starting and ending with -
, though, yes, some had M+15.995
MS-GF+ is in maintenance-mode status, so I cannot update it at present to exclude the M+15.995
results. Thus, you're just going to have to post-process the results to exclude the peptides you don't want. Be sure to use the Mzid to TSV Converter to convert the results, e.g.
MzidToTsvConverter.exe Results.mzid -sd -unroll
As for re-computing QValue, these Excel files demonstrate how to do that manually:
Correction to my previous post; I was confusing M+15.995 and auto-removal of the N-Terminal M residue. M+15.995 is a dynamic Met Ox, and that's allowed; it's the auto-removal of M that causes concern. Given how MS-GF+ uses dynamic programming to search for matches, I'm not certain that it would be straightforward to prevent the auto M-removal. Additionally, given how FASTA files are indexed, the multi-protein reporting is probably something you'll just have to work around.
One thought would be to convert your FASTA file to tab-delimited text, then sort by peptide, then remove duplicates to only keep the first occurrence of peptide. Next, convert from tab-delimited text back to FASTA. For this, use the Protein Digestion Simulator
I did some more digging. The existing code already has the option to disable considering M cleavage at the N-terminus. Update your parameter file to have this:
# Control N-terminal methionine cleavage
# 0 means to consider protein N-term Met cleavage (Default)
# 1 means to ignore protein N-term Met cleavage
IgnoreMetCleavage=1
I will update the program to show the option's value when it displays parameters at run-time. I'll also update the documentation and the example parameter files.
Also, these are not the same:
EnzymeID=0 and NTT=2
EnzymeID=9 and NTT=2
Here's a better description of EnzymeID:
# Enzyme ID
# 0 means unspecific cleavage (cleave after any residue)
# 1 means Trypsin (Default); use this along with NTT=0 for a no-enzyme-specificity search of a tryptically digested sample
# 2: Chymotrypsin, 3: Lys-C, 4: Lys-N, 5: Glu-C, 6: Arg-C, 7: Asp-N, 8: alphaLP, 9: No Cleavage (for peptidomics)
EnzymeID=1
Release 2021.03.22 includes an updated .jar file that shows the value of IgnoreMetCleavage at runtime. It also includes updated example parameter files.
Hi Matthew,
Thank you for all the detailed investigations and comments! The IgnoreMetCleavage is certainly going to be very useful, and it's great to know that I'm using the correct setttings (EnzymeID=9 and NTT=2) for my purpose (thanks also for the clarification on what EnzymeID=0 does! it is confusing that 0 and 9 both say "no enzyme").
Thanks again!!
Describe the question or problem Hi there, I wish to conduct a search using MSGF+ where the algorithm only considers EXACT matches to the peptides provided in the fasta database (with a static and a dynamic modification).
For example, there are two peptide entries in the fasta file:
My samples were digested with trypsin, so in my database there are only tryptic peptides (with some miscleavages that I have already included).
I am using the following settings, these are the only ones that I can think of that is relevant:
MSGF+ would return this result:
sample.mzML controllerType=0 controllerNumber=1 scan=65059 65059 HCD 537.5329 2 1.9908882 4 DFYAM+15.995IHAFWLIAVLYR peptide_1(pre=M,post=R);peptide_2(pre=M,post=-) 101 42 2.5559657E-9 0.006229213
My problem with this result is 2-fold:
Do you have any suggestions on how I could modify the params file to get cleaner results? Thanks.