Open glormph opened 1 year ago
Interestingly although the NTT is 0 in the XML, the stdout SearchParams shows it is 2:
PrecursorMassTolerance: 10.0 ppm
IsotopeError: -1,2
TargetDecoyAnalysis: false
FragmentationMethod: As written in the spectrum or CID if no info
Instrument: QExactive (Q-Exactive)
Enzyme: NoCleavage
Protocol: TMT
NumTolerableTermini: 2
IgnoreMetCleavage: true
NumTolerableTermini in the mzid results is changed here, forcing '0' if using enzyme settings 'Unspecific Cleavage' or 'No Cleavage': https://github.com/MSGFPlus/msgfplus/blob/master/src/main/java/edu/ucsd/msjava/mzid/AnalysisProtocolCollectionGen.java#L100
'-ntt' not being '0' is technically invalid for enzymes 'No Cleavage and 'Unspecific Cleavage', because for 'No Cleavage' there is no cleavage residue to verify/enforce, and for 'Unspecific Cleavage', there is no specific cleavage residue because all residues are possible cleavage points.
You might want to use '-ntt 0' for 'No Cleavage'; it appears that if a non-zero value is supplied for '-ntt' you still get some enzyme-search-specific behavior, and if you are using a predigested peptide DB, you probably don't want that. This also might be a change that we enforce in the code.
We did have to fix behavior for 'No Cleavage' in 2018 to have it not be treated the same as 'Unspecific Cleavage'; there is the possibility that other changes are also needed to have the correct behavior everywhere, but as mentioned before, MS-GF+ is in maintenance mode; we don't have the time or funding to put significant effort into improvements, but we will accept reasonable pull requests.
Yes, it makes sense forcing ntt to 0 for 'no cleavage', I'll try to use -ntt 0
and see if that makes a difference for the scoring.
I have understood that it is in maintenance mode, bad luck for me, but sounds reasonable!
So setting ntt to 0, and using no cleavage, the search becomes very slow (24 hours where a ntt 2 search takes maybe 3h30), and peptides matched seem to be a result of unspecific cleavage. It also did not improve scoring for the above mentioned PSM.
I tried to go through the code a bit to understand why, but my understanding is not great here. Maybe the amount of peptides to consider becomes to big in this line? https://github.com/MSGFPlus/msgfplus/blob/master/src/main/java/edu/ucsd/msjava/msdbsearch/DBScanner.java#L280
I am a bit in over my head here, and I haven't yet solved my actual question of why a tryptic peptide gets a lower score when searching with no cleavage. Also it is the weekend :)
Hi, I have a peptidomics run with a DB with pre-digested peptides, including those with one missed cleavage. I have recently discovered
-ignoreMetCleavage 1 -enzyme 9
, which I have started using. After that, a previously included peptide,MKDTDNEEEIR
disappeared from my results, and while it still matched, it did so at a lower score (RawScore ~60 vs previously ~160).I played with the parameters and found that:
-ntt 2
and-e 9
, the NTT in the mzIdentML is reported as 0-e 9
->-e 1
, and keeping-ignoreMetCleavage 1
the peptide above has more peaks matched and a higher RawScore.The diff in the
<AnalysisProtocolCollection>
between the trysin and no_enzyme is only this:The most eye catching to me is the difference in RawScore and matched ions. The diff between the SpectrumIdentificationResults for the two scans is:
So I'm wondering if the lower scoring somehow has to do with the termini, or that the enzyme has something to do with this?
This question may be related to #120