MSGFPlus / msgfplus

MS-GF+ (aka MSGF+ or MSGFPlus) performs peptide identification by scoring MS/MS spectra against peptides derived from a protein sequence database.
Other
72 stars 36 forks source link

Using trypsin vs no enzyme changes score on same PSM? #148

Open glormph opened 8 months ago

glormph commented 8 months ago

Hi, I have a peptidomics run with a DB with pre-digested peptides, including those with one missed cleavage. I have recently discovered -ignoreMetCleavage 1 -enzyme 9, which I have started using. After that, a previously included peptide, MKDTDNEEEIR disappeared from my results, and while it still matched, it did so at a lower score (RawScore ~60 vs previously ~160).

I played with the parameters and found that:

The diff in the <AnalysisProtocolCollection> between the trysin and no_enzyme is only this:

$ diff tryp.analysis notryp.analysis 
15c15
<       <userParam name="NumTolerableTermini" value="2"/>
---
>       <userParam name="NumTolerableTermini" value="0"/>
42c42
<       <Enzyme semiSpecific="false" missedCleavages="-1" id="Tryp">
---
>       <Enzyme semiSpecific="true" missedCleavages="-1" id="NoCleavage">
44c44
<           <cvParam cvRef="PSI-MS" accession="MS:1001251" name="Trypsin"/>
---
>           <cvParam cvRef="PSI-MS" accession="MS:1001955" name="no cleavage"/>

The most eye catching to me is the difference in RawScore and matched ions. The diff between the SpectrumIdentificationResults for the two scans is:

$ diff tryp.MKDTDNEEEIR notryp.MKDTDNEEEIR 
3,7c3,7
<           <PeptideEvidenceRef peptideEvidence_ref="PepEv_20762373_MKDTDNEEEIR_1"/>
<           <cvParam cvRef="PSI-MS" accession="MS:1002049" name="MS-GF:RawScore" value="161"/>
<           <cvParam cvRef="PSI-MS" accession="MS:1002050" name="MS-GF:DeNovoScore" value="162"/>
<           <cvParam cvRef="PSI-MS" accession="MS:1002052" name="MS-GF:SpecEValue" value="7.9780874E-16"/>
<           <cvParam cvRef="PSI-MS" accession="MS:1002053" name="MS-GF:EValue" value="6.539515E-8"/>
---
>           <PeptideEvidenceRef peptideEvidence_ref="PepEv_13257102_MKDTDNEEEIR_1"/>
>           <cvParam cvRef="PSI-MS" accession="MS:1002049" name="MS-GF:RawScore" value="66"/>
>           <cvParam cvRef="PSI-MS" accession="MS:1002050" name="MS-GF:DeNovoScore" value="124"/>
>           <cvParam cvRef="PSI-MS" accession="MS:1002052" name="MS-GF:SpecEValue" value="7.1621056E-9"/>
>           <cvParam cvRef="PSI-MS" accession="MS:1002053" name="MS-GF:EValue" value="0.49847928"/>
10,22c10,22
<           <userParam name="ExplainedIonCurrentRatio" value="0.28976208"/>
<           <userParam name="NTermIonCurrentRatio" value="0.18838717"/>
<           <userParam name="CTermIonCurrentRatio" value="0.101374924"/>
<           <userParam name="MS2IonCurrent" value="5932224.0"/>
<           <userParam name="NumMatchedMainIons" value="19"/>
<           <userParam name="MeanErrorAll" value="2.4083107"/>
<           <userParam name="StdevErrorAll" value="3.6545222"/>
<           <userParam name="MeanErrorTop7" value="0.85264057"/>
<           <userParam name="StdevErrorTop7" value="0.54933465"/>
<           <userParam name="MeanRelErrorAll" value="-1.0829566"/>
<           <userParam name="StdevRelErrorAll" value="4.240601"/>
<           <userParam name="MeanRelErrorTop7" value="-0.6560206"/>
<           <userParam name="StdevRelErrorTop7" value="0.77356416"/>
---
>           <userParam name="ExplainedIonCurrentRatio" value="0.12744826"/>
>           <userParam name="NTermIonCurrentRatio" value="0.028266326"/>
>           <userParam name="CTermIonCurrentRatio" value="0.09918193"/>
>           <userParam name="MS2IonCurrent" value="6465587.5"/>
>           <userParam name="NumMatchedMainIons" value="12"/>
>           <userParam name="MeanErrorAll" value="1.9757134"/>
>           <userParam name="StdevErrorAll" value="1.3534566"/>
>           <userParam name="MeanErrorTop7" value="2.0432508"/>
>           <userParam name="StdevErrorTop7" value="1.2680426"/>
>           <userParam name="MeanRelErrorAll" value="-0.782619"/>
>           <userParam name="StdevRelErrorAll" value="2.2633593"/>
>           <userParam name="MeanRelErrorTop7" value="-0.86563396"/>
>           <userParam name="StdevRelErrorTop7" value="2.2435427"/>

So I'm wondering if the lower scoring somehow has to do with the termini, or that the enzyme has something to do with this?

This question may be related to #120

glormph commented 8 months ago

Interestingly although the NTT is 0 in the XML, the stdout SearchParams shows it is 2:

        PrecursorMassTolerance: 10.0 ppm
        IsotopeError: -1,2
        TargetDecoyAnalysis: false
        FragmentationMethod: As written in the spectrum or CID if no info
        Instrument: QExactive (Q-Exactive)
        Enzyme: NoCleavage
        Protocol: TMT
        NumTolerableTermini: 2
        IgnoreMetCleavage: true
FarmGeek4Life commented 8 months ago

NumTolerableTermini in the mzid results is changed here, forcing '0' if using enzyme settings 'Unspecific Cleavage' or 'No Cleavage': https://github.com/MSGFPlus/msgfplus/blob/master/src/main/java/edu/ucsd/msjava/mzid/AnalysisProtocolCollectionGen.java#L100

'-ntt' not being '0' is technically invalid for enzymes 'No Cleavage and 'Unspecific Cleavage', because for 'No Cleavage' there is no cleavage residue to verify/enforce, and for 'Unspecific Cleavage', there is no specific cleavage residue because all residues are possible cleavage points.

You might want to use '-ntt 0' for 'No Cleavage'; it appears that if a non-zero value is supplied for '-ntt' you still get some enzyme-search-specific behavior, and if you are using a predigested peptide DB, you probably don't want that. This also might be a change that we enforce in the code.

We did have to fix behavior for 'No Cleavage' in 2018 to have it not be treated the same as 'Unspecific Cleavage'; there is the possibility that other changes are also needed to have the correct behavior everywhere, but as mentioned before, MS-GF+ is in maintenance mode; we don't have the time or funding to put significant effort into improvements, but we will accept reasonable pull requests.

glormph commented 8 months ago

Yes, it makes sense forcing ntt to 0 for 'no cleavage', I'll try to use -ntt 0 and see if that makes a difference for the scoring.

I have understood that it is in maintenance mode, bad luck for me, but sounds reasonable!

glormph commented 8 months ago

So setting ntt to 0, and using no cleavage, the search becomes very slow (24 hours where a ntt 2 search takes maybe 3h30), and peptides matched seem to be a result of unspecific cleavage. It also did not improve scoring for the above mentioned PSM.

I tried to go through the code a bit to understand why, but my understanding is not great here. Maybe the amount of peptides to consider becomes to big in this line? https://github.com/MSGFPlus/msgfplus/blob/master/src/main/java/edu/ucsd/msjava/msdbsearch/DBScanner.java#L280

I am a bit in over my head here, and I haven't yet solved my actual question of why a tryptic peptide gets a lower score when searching with no cleavage. Also it is the weekend :)