compomics / ms2pip

MS²PIP: Fast and accurate peptide spectrum prediction for multiple fragmentation methods, instruments, and labeling techniques.
https://ms2pip.readthedocs.io
Apache License 2.0
35 stars 18 forks source link

Need help on learning to run MS2PIP #89

Closed nattzy94 closed 4 years ago

nattzy94 commented 4 years ago

Hi,

I am new to mass spec analysis and would like to use MS2PIP to improve protein predictions. My main goal is to identify small proteins in a mass spec dataset.

Currently, what I have done is to search mgf files against a database of Uniprot annotated proteins (H. sapiens). I then searched the resulting unmatched spectra against a database of small proteins. This outputs a number of predictions of small proteins. All of the searches were performed on PeptideShaker. As the MS experiment was not optimised for small proteins, the small peptide predictions are naturally, of low/doubtful confidence. Hence, I would like to see if using MS2PIP could improve the prediction quality.

I am a little confused as to where to start, however. I understand MS2PIP requires a PEPREC file to run. I generated a PEPREC of small proteins that I am interested in (~40,000 small proteins). This was done using the fasta2PEPREC.py script in the conversion_tools folder. I am not sure if I did this correctly as the resulting PEPREC file does not contain any amino acid mods (e.g. oxidation of M, carbamidomethylatino of C). How do I generate a PEPREC file properly containing AA modifications?

Having generated the PEPREC file, I then ran ms2pip and this outputs a HCD_predictions.csv file. I am stuck here as I don't know how to proceed to get improved protein predictions. Am I using the right workflow i.e. should I be starting from the protein database in the first place or should I start from the output predictions from PeptideShaker?

RalfG commented 4 years ago

Hi! You've come to the right place! MS2 spectrum predictions can give a boost in sensitivity to challenging identification workflows (see https://doi.org/10.1093/bioinformatics/btz383, and https://doi.org/10.1002/pmic.201900351). The easiest and most versatile way to make use of MS²PIP to improve your identification workflow is with MS²ReScore. I noticed your issue over there (https://github.com/compomics/ms2rescore/issues/11), so I'll help you out in that issue thread.

To clarify the use cases for our MS²PIP-related tools: