bigbio / quantms

Quantitative mass spectrometry workflow. Currently supports proteomics experiments with complex experimental designs for DDA-LFQ, DDA-Isobaric and DIA-LFQ quantification.
https://quantms.org
MIT License
34 stars 35 forks source link

the charge min and max and missclevages are sometimes not working #364

Open ypriverol opened 7 months ago

ypriverol commented 7 months ago

Description of the bug

@jpfeuffer @timosachsenberg @daichengxin I found one dataset that we search using msgf, here the command:

#!/bin/bash -euo pipefail
MSGFPlusAdapter \
    -protocol automatic \
    -in 01086_C01_P010738_S00_N03_R1.mzML \
    -out 01086_C01_P010738_S00_N03_R1_msgf.idXML \
    -executable $(find /usr/local/share/msgf_plus-*/MSGFPlus.jar -maxdepth 0) \
    -threads 6 \
    -java_memory 30720 \
    -database "GRCh38r110_GCA97s_coding_proteins_19Jul23-decoy.fa" \
    -instrument high_res \
    -matches_per_spec 1 \
    -min_precursor_charge 2 \
    -max_precursor_charge 4 \
    -min_peptide_length 6 \
    -max_peptide_length 40 \
    -max_missed_cleavages 2 \
    -isotope_error_range 0,1 \
    -enzyme "Trypsin/P" \
    -tryptic fully \
    -precursor_mass_tolerance 40.0 \
    -precursor_error_units ppm \
    -fixed_modifications 'Carbamidomethyl (C)' \
    -variable_modifications 'Acetyl (Protein N-term)' 'Deamidated (N)' 'Deamidated (Q)' 'Oxidation (M)' \
    -max_mods 3 \
    -PeptideIndexing:IL_equivalent \
    -PeptideIndexing:unmatched_action warn \
    -debug 0 \
     \
    2>&1 | tee 01086_C01_P010738_S00_N03_R1_msgf.log

However in the file output I found the following id:

<PeptideIdentification score_type="SpecEValue" higher_score_better="false" significance_threshold="0.0" MZ="664.68194580078125" RT="33
78.397500000000036" spectrum_reference="controllerType=0 controllerNumber=1 scan=24975" >
            <PeptideHit score="1.4043417e-21" sequence="INNAHTIGC(Carbamidomethyl)NAVSWAPAVVPGSLIDHPSGQKPNYIKR" charge="6" aa_before="K K 
K K K K K K K K K K K K K" aa_after="F F F F F F F F F F F F F F F" start="130 147 144 130 190 130 147 144 130 190 130 147 144 130 190" end="166 183 1
80 166 226 166 183 180 166 226 166 183 180 166 226" protein_refs="PH_14293 PH_14294 PH_14295 PH_14296 PH_14297 PH_44721 PH_44722 PH_44723 PH_44724 PH_
44725 PH_112619 PH_112620 PH_112621 PH_112622 PH_112623" >
                <UserParam type="float" name="MS:1002049" value="103.0"/>
                <UserParam type="float" name="MS:1002050" value="165.0"/>
                <UserParam type="float" name="MS:1002052" value="1.4043417e-21"/>
                <UserParam type="float" name="MS:1002053" value="6.614773000000001e-14"/>
                <UserParam type="string" name="AssumedDissociationMethod" value="HCD"/>
                <UserParam type="string" name="CTermIonCurrentRatio" value="0.3437819"/>
                <UserParam type="string" name="ExplainedIonCurrentRatio" value="0.39947474"/>
                <UserParam type="string" name="MS2IonCurrent" value="2429519.8"/>
                <UserParam type="string" name="MeanErrorAll" value="4.888304"/>
                <UserParam type="string" name="MeanErrorTop7" value="2.5796666"/>
                <UserParam type="string" name="MeanRelErrorAll" value="-0.8928608"/>
                <UserParam type="string" name="MeanRelErrorTop7" value="2.5497687"/>
                <UserParam type="string" name="NTermIonCurrentRatio" value="0.055692848"/>
                <UserParam type="string" name="NumMatchedMainIons" value="23"/>
                <UserParam type="string" name="StdevErrorAll" value="4.698519"/>
                <UserParam type="string" name="StdevErrorTop7" value="1.8443376"/>
                <UserParam type="string" name="StdevRelErrorAll" value="6.7211905"/>
                <UserParam type="string" name="StdevRelErrorTop7" value="1.885455"/>
                <UserParam type="float" name="calcMZ" value="664.51446533203125"/>
                <UserParam type="int" name="pass_threshold" value="1"/>
                <UserParam type="int" name="start" value="191"/>
                <UserParam type="int" name="end" value="227"/>
                <UserParam type="string" name="target_decoy" value="target"/>
                <UserParam type="string" name="isotope_error" value="1"/>
                <UserParam type="string" name="protein_references" value="non-unique"/>
            </PeptideHit>
            <UserParam type="string" name="MS:1001115" value="24975"/>
        </PeptideIdentification>

What could be the problem, this also happens for comet.

Command used and terminal output

No response

Relevant files

No response

System information

No response

timosachsenberg commented 7 months ago

https://github.com/OpenMS/OpenMS/blob/079143800f7ed036a7c68ea6e124fe4f5cfc9569/src/topp/MSGFPlusAdapter.cpp#L166 according to this comment in our adapter it is only used if no charge is annotated in the mzML

ypriverol commented 7 months ago

@jpfeuffer @timosachsenberg would it make sense to add a parameter to filter the psms in that charge range?

timosachsenberg commented 7 months ago

good question. I think these high charge peptides are potentially interesting so one could argue that one wants them to be reported. On the other hand you get more defined / consistent results without filtering. I would probably keep them by default but I could add an optional filter if we decide that we want to filter them