compomics / DeepLC

DeepLC: Retention time prediction for (modified) peptides using Deep Learning.
https://iomics.ugent.be/deeplc
Apache License 2.0
52 stars 18 forks source link

How to convert peptide modifications of maxquant output file to MS2PIP-style #44

Closed lbwfff closed 2 years ago

lbwfff commented 2 years ago

Hi,

I want to use deeplc to predict retention times for some peptides, but since I am using maxquant for the analysis, I don't know how to convert the peptide modifications of the output file to the MS2PIP-style that deeplc needs, The output of maxquant looks like the following:

ASMGTLAFDEYGRPFLIIK | 19 | Acetyl (Protein N-term),Oxidation (M) | _(Acetyl (Protein N-term))ASM(Oxidation   (M))GTLAFDEYGRPFLIIK_
-- | -- | -- | --
DDDIAALVVDNGSGMCK | 17 | Acetyl (Protein N-term),Oxidation (M) | _(Acetyl (Protein N-term))DDDIAALVVDNGSGM(Oxidation (M))CK_
AMEALATAEQACK | 13 | Oxidation (M) | _AM(Oxidation (M))EALATAEQACK_
MQQQLDEYQELLDIK | 15 | Oxidation (M) | _M(Oxidation (M))QQQLDEYQELLDIK_

Has anyone tried to convert such decoration information into the input information required by deeplc, is there any software or function that can do it?

Thanks, LeeLee

RobbinBouwmeester commented 2 years ago

Dear LeeLee,

Together with @RalfG and @ArthurDeclercq we are actually working on this right now. I hope to have integrated this by the end of next week.

Kind regards,

Robbin

RobbinBouwmeester commented 2 years ago

Dear LeeLee,

This is taking a bit longer than anticipated.

@ArthurDeclercq noted that you can run MS2Rescore (https://github.com/compomics/ms2rescore) with MQ. And the pin file (I believe also the output file of MS2Rescore) should contain DeepLC predictions. As a plus you get rescored results :).

The parser is still coming in a new version.

Kind regards,

Robbin

lbwfff commented 2 years ago

Hi, Robbin

In fact, I did this conversion with some R code, similar to the following:

trans_max2ms2<-function(mod_seq){
  if(length(grep('Acetyl',mod_seq))>0 | length(grep('Oxidation',mod_seq))>0) {
test<-c(mod_seq)
test2<-as.data.frame(unlist(strsplit(test,'')))
colnames(test2)[1]<-c('AA')
test2$state<-c(NA)
test2$state[which(test2$AA=='(')]<-('MOD_START')
test2$state[which(test2$AA==')')]<-('MOD_END')

num<-(nrow(test2[!is.na(test2$state),])/4) #Only two modifications exist in my file

test2$MOD<-c(NA)
for (j in 1:num){
  test2$MOD[which(test2$state=='MOD_START')[j+(j-1)]:which(test2$state=='MOD_END')[2*j]]<-c(paste0('MOD_',j))
}
test2[is.na(test2)]<-(0)

mod_inf<-as.data.frame(array(NA,c(num,2)))
for (j in 1:num){
  mod_inf$V1[j]<-paste0(test2$AA[which(test2$MOD==(paste0('MOD_',j)))],collapse = '')
  mod_inf$V2[j]<-which(test2$state=='MOD_START')[1]
  test2<-test2[test2$MOD!=paste0('MOD_',j),]
  }

mod_inf$V2<-(mod_inf$V2-2)
mod_inf$adj_mod<-ifelse(mod_inf$V1=='(Acetyl (Protein N-term))','Acetyl','Oxidation')

cache<-c(NA)
for (k in 1:nrow(mod_inf)){
  cache<-c(cache,mod_inf$V2[k],'|',mod_inf$adj_mod[k],'|')
}

desc<-cache[-1]
desc<-desc[-length(desc)]
desc<-paste0(desc,collapse = '')
return(desc)
  } else{
    desc<-c('') 
    return(desc)
    }
} 

test<-trans_max2ms2('_(Acetyl (Protein N-term))AEM(Oxidation (M))KEKYEAIVEENK_') #a test

This already solves my needs as I only need to deal with two modifications, just for reference.

Thanks, LeeLee

RobbinBouwmeester commented 2 years ago

Dear LeeLee,

That is great! I will close this comment for now, but feel free to reopen it.

Kind regards,

Robbin

RalfG commented 1 year ago

For future reference, our new package psm_utils can handle these type of conversions. Next to the python package, we also provide a user-friendly web server at https://psm-utils.streamlitapp.com/. You can find more about psm_utils in our new publication: https://doi.org/10.1021/acs.jproteome.2c00609