compomics / DeepLC

DeepLC: Retention time prediction for (modified) peptides using Deep Learning.
https://iomics.ugent.be/deeplc
Apache License 2.0
52 stars 18 forks source link

How can I get the PSI-MS format for the 'modifications' column? #67

Closed ec-ho-ra-mos closed 9 months ago

ec-ho-ra-mos commented 9 months ago

image

Hello!

I have a list of modified peptides in the 'ModifiedPeptides' column above. I would like to use DeepLC to predict their retention times. Is there an efficient way to reformat this column into the 'modifications' column input for DeepLC?

RobbinBouwmeester commented 9 months ago

Hi,

Actually this should work if you have a format supported in psm_utils. Simply provide the file to DeepLC and make sure the extension is in line with the format.

Let me know if that works!

Kind regards,

Robbin

ec-ho-ra-mos commented 9 months ago

I tried running DeepLC with the "ModifiedPeptide" column as the "modifications" column for DeepLC's input. However, I've noticed that DeepLC's output in the "Sequence proforma" column does not include the modification within the sequence.

The "ModifiedPeptide" column is an output of DIA-NN, although I'm not exactly sure about its format.

RobbinBouwmeester commented 9 months ago

I will get back to this ASAP, in the meantime it would be useful if you could check the input formats from psm_utils and if your format from DIA-NN fits any of these.

ec-ho-ra-mos commented 9 months ago

Thank you very much for your response to my inquiry. I truly appreciate it.

I checked the psm_utils, and I don't think I see any format resembling the format used in DIA-NN. But I will continue checking and provide you with an update if I find something.

In the meantime, I have attached the files I am using for your reference. 231108_DeepLC_input-calibration-file.csv 231108_DeepLC_input-peptides.csv

RobbinBouwmeester commented 9 months ago

Thanks I will have a look, and maybe I will also open an issue on psm_utils for DIA-NN support.

RobbinBouwmeester commented 9 months ago

Hi,

I have had a look and it was fairly easy to get it more compliant with proforma and read it in with psm_utils:

from psm_utils.psm import PSM
from psm_utils.psm_list import PSMList
from psm_utils.io import write_file

import pandas as pd

For the sequences you want to make predictions for:

infile = pd.read_csv("231108_DeepLC_input-peptides.csv")
psm_list = []

for idx,row in infile.iterrows():
    seq = row["modifications"].replace("(","[").replace(")","]")

    if seq.startswith("["):
        idx_nterm = seq.index("]")
        seq = seq[:idx_nterm+1]+"-"+seq[idx_nterm+1:]

    psm_list.append(PSM(peptidoform=seq,spectrum_id=idx))

psm_list = PSMList(psm_list=psm_list)

For the calibration file:

infile = pd.read_csv("231108_DeepLC_input-calibration-file.csv")
psm_list_calib = []

for idx,row in infile.iterrows():
    seq = row["seq"].replace("(","[").replace(")","]")

    if seq.startswith("["):
        idx_nterm = seq.index("]")
        seq = seq[:idx_nterm+1]+"-"+seq[idx_nterm+1:]

    psm_list_calib.append(PSM(peptidoform=seq,retention_time=row["tr"],spectrum_id=idx))

psm_list_calib = PSMList(psm_list=psm_list_calib)

You can pass the PSM lists directly to DeepLC with the psm_list parameter:

    def make_preds(self,
                   psm_list=None,
                   infile="",
                   calibrate=True,
                   seq_df=None,
                   mod_name=None):
ec-ho-ra-mos commented 9 months ago

Thank you very much for your reply. From what I understand, you only change the parentheses to brackets, right?

I did the same thing and ran it on DeepLC. However, in the output CSV file, the “sequence proforma” column does not show any modifications for the sequences. Is this correct?

RobbinBouwmeester commented 9 months ago

How are you calling DeepLC exactly? Also an easy way to check is for peptides with the same sequence, but different modifications, if the predicted retention times are different.

ec-ho-ra-mos commented 9 months ago

I am using the DeepLC GUI.

RobbinBouwmeester commented 9 months ago

That might be problematic as it tries to infer the filetype. Which would be peprec, but that is not the case. Are you able to run it via the python library directly?

ec-ho-ra-mos commented 9 months ago

Hello! I used DeepLC using Python and it's working as intended now. Thank you very much!

RobbinBouwmeester commented 9 months ago

Great to hear! However... I will see if I can do something in the future so that people do not run into the same error...