compomics / ms2rescore

Modular and user-friendly platform for AI-assisted rescoring of peptide identifications
https://ms2rescore.readthedocs.io
Apache License 2.0
39 stars 14 forks source link

Modification and flanking amino acid information absent from final output (pout) file #92

Closed Hassan-1991 closed 8 months ago

Hassan-1991 commented 10 months ago

Dear Concern,

I'm running MS-GF+ms2rescore, and as the title suggests, the final .pout file doesn't have flanking amino acid or modification information in the peptide column.

This is the config file I'm using:

    "$schema": "./config_schema.json",
    "general":{
        "pipeline":"infer",
        "feature_sets":[["searchengine", "ms2pip", "rt"]],
        "run_percolator":true,
        "id_decoy_pattern": "XXX_",
        "num_cpu":12,
        "config_file":null,
        "tmp_path":"path/to/tmp/",
        "mgf_path":null,
        "output_filename":"path/to/output",
        "log_level": "info",
        "plotting": false
    },
    "ms2pip": {
        "model": "HCD",
        "frag_error": 0.02,
        "modifications": [
            {"name":"Acetyl", "unimod_accession":1, "mass_shift":42.010565, "amino_acid":null, "n_term":true, "c_term": false},
            {"name":"Carbamidomethyl", "unimod_accession":4, "mass_shift":57.021464, "amino_acid":"C", "n_term":false, "c_term": false},
            {"name":"Deamidated", "unimod_accession":7, "mass_shift":0.984016, "amino_acid":"N", "n_term":false, "c_term": false},
            {"name":"PhosphoS", "unimod_accession":21, "mass_shift":79.966331, "amino_acid":"S", "n_term":false, "c_term": false},
            {"name":"PhosphoT", "unimod_accession":21, "mass_shift":79.966331, "amino_acid":"T", "n_term":false, "c_term": false},
            {"name":"PhosphoY", "unimod_accession":21, "mass_shift":79.966331, "amino_acid":"Y", "n_term":false, "c_term": false},
            {"name":"Pyro-carbamidomethyl", "unimod_accession":26, "mass_shift":39.994915, "amino_acid":"C", "n_term":false, "c_term": false},
            {"name":"Glu->pyro-Glu", "unimod_accession":27, "mass_shift":-18.010565, "amino_acid":"E", "n_term":true, "c_term": false},
            {"name":"Gln->pyro-Glu", "unimod_accession":28, "mass_shift":-17.026549, "amino_acid":"Q", "n_term":true, "c_term": false},
            {"name":"Oxidation", "unimod_accession":35, "mass_shift":15.994915, "amino_acid":"M", "n_term":false, "c_term": false},
            {"name":"iTRAQ", "unimod_accession":214, "mass_shift":144.102063, "amino_acid":null, "n_term":true, "c_term": false},
            {"name":"Ammonia-loss", "unimod_accession":385, "mass_shift":-17.026549, "amino_acid":"C", "n_term":true, "c_term": false},
            {"name":"TMT6plexN", "unimod_accession":737, "mass_shift":229.162932, "amino_acid":"N", "n_term":false, "c_term": false},
            {"name":"TMT6plex", "unimod_accession":737, "mass_shift":229.162932, "amino_acid":null, "n_term":true, "c_term": false},
            {"name":"Amidated", "unimod_accession": 2, "mass_shift": -0.984016, "amino_acid":null, "n_term": false, "c_term": true}
        ]
    },
    "maxquant_to_rescore": {
        "mgf_title_pattern": "TITLE=.*scan=([0-9]+).*$",
        "modification_mapping":{
            "ox":"Oxidation",
            "cm":"Carbamidomethyl"
        },
        "fixed_modifications":{
            "C":"Carbamidomethyl"
        }
    },
    "percolator": {}
}

I'm running these on a pair of mzid and mgf files, as mentioned in the documentation. When I run MS-GF+, I specify carbamidomethylation of C as fixed modification and oxidation of M as variable modification.

I'd be grateful for your help in this regard. Even short of an actual fix, if you can suggest a way to extract this information from the generated files, that would be great.

Thanks again and very best regards, Hassan

RalfG commented 8 months ago

Hi @Hassan-1991,

I'm happy to announce that MS²Rescore 3.0 now supports modifications in the .pout output (which is written when Percolator is selected as rescoring engine, or no rescoring engines are specified). Leading and trailing amino acids will not be there, but the peptides are correctly written in the notation with leading and trailing dots (e.g., .ACDEK.).

Best, Ralf