Closed ec-ho-ra-mos closed 9 months ago
Hi,
Actually this should work if you have a format supported in psm_utils. Simply provide the file to DeepLC and make sure the extension is in line with the format.
Let me know if that works!
Kind regards,
Robbin
I tried running DeepLC with the "ModifiedPeptide" column as the "modifications" column for DeepLC's input. However, I've noticed that DeepLC's output in the "Sequence proforma" column does not include the modification within the sequence.
The "ModifiedPeptide" column is an output of DIA-NN, although I'm not exactly sure about its format.
I will get back to this ASAP, in the meantime it would be useful if you could check the input formats from psm_utils and if your format from DIA-NN fits any of these.
Thank you very much for your response to my inquiry. I truly appreciate it.
I checked the psm_utils, and I don't think I see any format resembling the format used in DIA-NN. But I will continue checking and provide you with an update if I find something.
In the meantime, I have attached the files I am using for your reference. 231108_DeepLC_input-calibration-file.csv 231108_DeepLC_input-peptides.csv
Thanks I will have a look, and maybe I will also open an issue on psm_utils for DIA-NN support.
Hi,
I have had a look and it was fairly easy to get it more compliant with proforma and read it in with psm_utils:
from psm_utils.psm import PSM
from psm_utils.psm_list import PSMList
from psm_utils.io import write_file
import pandas as pd
For the sequences you want to make predictions for:
infile = pd.read_csv("231108_DeepLC_input-peptides.csv")
psm_list = []
for idx,row in infile.iterrows():
seq = row["modifications"].replace("(","[").replace(")","]")
if seq.startswith("["):
idx_nterm = seq.index("]")
seq = seq[:idx_nterm+1]+"-"+seq[idx_nterm+1:]
psm_list.append(PSM(peptidoform=seq,spectrum_id=idx))
psm_list = PSMList(psm_list=psm_list)
For the calibration file:
infile = pd.read_csv("231108_DeepLC_input-calibration-file.csv")
psm_list_calib = []
for idx,row in infile.iterrows():
seq = row["seq"].replace("(","[").replace(")","]")
if seq.startswith("["):
idx_nterm = seq.index("]")
seq = seq[:idx_nterm+1]+"-"+seq[idx_nterm+1:]
psm_list_calib.append(PSM(peptidoform=seq,retention_time=row["tr"],spectrum_id=idx))
psm_list_calib = PSMList(psm_list=psm_list_calib)
You can pass the PSM lists directly to DeepLC with the psm_list parameter:
def make_preds(self,
psm_list=None,
infile="",
calibrate=True,
seq_df=None,
mod_name=None):
Thank you very much for your reply. From what I understand, you only change the parentheses to brackets, right?
I did the same thing and ran it on DeepLC. However, in the output CSV file, the “sequence proforma” column does not show any modifications for the sequences. Is this correct?
How are you calling DeepLC exactly? Also an easy way to check is for peptides with the same sequence, but different modifications, if the predicted retention times are different.
I am using the DeepLC GUI.
That might be problematic as it tries to infer the filetype. Which would be peprec, but that is not the case. Are you able to run it via the python library directly?
Hello! I used DeepLC using Python and it's working as intended now. Thank you very much!
Great to hear! However... I will see if I can do something in the future so that people do not run into the same error...
Hello!
I have a list of modified peptides in the 'ModifiedPeptides' column above. I would like to use DeepLC to predict their retention times. Is there an efficient way to reformat this column into the 'modifications' column input for DeepLC?