bartongroup / Proteus

R package for analysing proteomics data
MIT License
48 stars 3 forks source link

Error in the extraction commands while retrieving ids #32

Closed gurpreet-bioinfo closed 4 years ago

gurpreet-bioinfo commented 4 years ago

Hi there,

I followed section 4.5 for protein annotations. By looking at the dataframe ids, it seems that it does not extract all the values for uniprot column from prot.MQ.med$proteins. This leads to NA under uniprot, gene and protein names for res dataframe, for which uniprot annotation exists as evident from uniprot id under protein column. I also verified this by manually searching in the UniProt for few proteins.

It all happens due to some error in the following extraction commands:

luni <- lapply(as.character(prot.MQ.med$proteins), function(prot) {
 if(grepl("sp\\|", prot)) {
   uniprot <- unlist(strsplit(prot, "|", fixed=TRUE))[2]
   c(prot, uniprot)
 }
})

ids <- as.data.frame(do.call(rbind, luni))

Would it be possible to correct this as it would be helpful to get the latest annotations?

Thanks. Gurpreet

MarekGierlinski commented 4 years ago

This piece of code is only an example of how Uniprot identifiers can be extracted. It works only for a specific format of input data, with identifiers like sp|P00546|CDK1_YEAST. Clearly, your IDs are in a different format. This is really outside of the scope of Proteus and needs some basic R programming in order to convert IDs to a required format.

gurpreet-bioinfo commented 4 years ago

@MarekGierlinski Thanks for letting me know. I will do that.

gurpreet-bioinfo commented 4 years ago

Just for information, following is giving me correct ids:

luni <- lapply(as.character(prot.MQ.med$proteins), function(prot) {
 if(grepl("sp|tr\\|", prot)) {                                   # sp|tr
   uniprot <- unlist(strsplit(prot, "|", fixed=TRUE))[2]
   c(prot, uniprot)
 }
})

ids <- as.data.frame(do.call(rbind, luni))