compomics / peptide-shaker

Interpretation of proteomics identification results
http://compomics.github.io/projects/peptide-shaker.html
47 stars 19 forks source link

All Leucine residues replaced by Isoleucines in MZID file #388

Closed MarcIsak closed 4 years ago

MarcIsak commented 4 years ago

Hi,

after opening an exported *.mzid file from PeptideShaker-1.16.42 in RStudio, I noticed that all peptide sequences had gotten their Leucine residues replaced by Isoleucine. What is the reason for this?

I would like to know if there is a way to disable this conversion? I did not find any settings in SearchGUI 3.3.16 where I could change this at least. Or is there something that can be done in PeptideShaker to prevent this?

Best,

Marc

hbarsnes commented 4 years ago

Hi Marc,

Are you sure this is not something that happens in RStudio's processing of the mzid file? As I cannot reproduce this on my end? At least I have plenty of Leucine residues after exporting the PeptideShaker example dataset to mzid.

Best regards, Harald

MarcIsak commented 4 years ago

Thanks for the quick response Harald!

just to make sure that my RStudio doesn't do something weird, could you help me take a look at this mzid file. I have double checked, and there are no "L" for any peptide sequences in this file after importing it into RStudio (I use the MZID package, Bioconductor). What do you use to look at MZID?

I give you a dropbox link here:

https://www.dropbox.com/s/wgu2fk7w1gvdo9d/Mistr_Dorsal_Ventral_Exo_180629.zip?dl=0

Best,

Marc

hbarsnes commented 4 years ago

Hi Marc,

Here's an example from your file of a peptide with both "I" and "L":

<Peptide id="QHQIIHSSQSFCR_-17.026549101009998-ATAA-1">
    <PeptideSequence>QHQILHSSQSFCR</PeptideSequence>
    <Modification monoisotopicMassDelta="-17.026549" residues="Q" location="0" >
        <cvParam cvRef="UNIMOD" accession="UNIMOD:28" name="Gln-&gt;pyro-Glu"/>
    </Modification>
    <Modification monoisotopicMassDelta="57.021464" residues="C" location="12" >
        <cvParam cvRef="UNIMOD" accession="UNIMOD:4" name="Carbamidomethyl"/>
    </Modification>
</Peptide>

Perhaps you are looking at the peptide IDs and not the actual peptide sequences?

What do you use to look at MZID?

To inspect the content of large text files I use Vim (https://www.vim.org). But there are probably lots of other potentially better alternatives.

Best regards, Harald

MarcIsak commented 4 years ago

Hi Harald, I use this Bioconductor package in R to look at *.mzid files:

http://bioconductor.org/packages/release/bioc/html/mzID.html

Hmm, it is strange that the peptideID is without L residues but the peptide sequence is with L residues... I will see if I can extract the peptide sequences with L residues somehow.

Best,

Marc

MarcIsak commented 4 years ago

Hi again Harald,

I opened the file again with the MZID package and managed to find the original sequences with L:s and their respective peptideIDs without L:s

peptID_peptSeq_ex

Thanks for all the help!

Best,

Marc

hbarsnes commented 4 years ago

Hi Marc,

Great! Note that the ID is just an internal identifier within the mzid file and does not even have to contain the peptide sequence. So best to work with the actual sequences. :)

Best regards, Harald