Closed muecker closed 2 years ago
Hi,
I appreciate your interest in using our package. I believe that is due to the new API updates by Uniprot DB. We will solve this issue in the nearest update of UniprotR. For now, I wrote a new version of the GetProteinAnnontate function so you can use it until the next update.
library(curl)
GetProteinAnnontate <-
function (ProteinAccList, columns)
{
if (!has_internet()) {
message("Please connect to the internet as the package requires internect connection.")
return()
}
baseUrl <- "https://rest.uniprot.org/uniprotkb/"
ProteinInfoParsed_total_col = data.frame(x = "x")
for (filed in columns) {
ProteinInfoParsed_total <- data.frame()
for (ProteinAcc in ProteinAccList) {
Request <- tryCatch({
GET(paste0(baseUrl, ProteinAcc, ".xml"), timeout(10))
}, error = function(cond) {
message("Internet connection problem occurs and the function will return the original error")
message(cond)
})
ProteinName_url <- paste0("/search?query=accession:", ProteinAcc,
"&format=tsv&fields=", filed)
RequestUrl <- paste0(baseUrl, ProteinName_url)
if (length(Request) == 0) {
message("Internet connection problem occurs")
return()
}
if (Request$status_code == 200) {
parse_true <- function() {
ProteinInfoParsed <- as.data.frame(read.csv(RequestUrl,
sep = "\t", header = TRUE), row.names = ProteinAcc)
return(ProteinInfoParsed)
}
parse_false <- function() {
ProteinInfoParsed <- read.csv(RequestUrl,
sep = "\t", header = TRUE)
names <- names(ProteinInfoParsed)
ProteinInfoParsed <- data.frame(name_col = "NA",
row.names = ProteinAcc)
colnames(ProteinInfoParsed) <- names
return(ProteinInfoParsed)
}
ProteinInfoParsed <- tryCatch(parse_true(),
error = function(e) parse_false())
ProteinInfoParsed_total <- rbind(ProteinInfoParsed_total,
ProteinInfoParsed)
}
else {
HandleBadRequests(Request$status_code)
}
}
ProteinInfoParsed_total_col <- cbind(ProteinInfoParsed_total_col,
ProteinInfoParsed_total)
remove(ProteinInfoParsed_total)
}
ProteinInfoParsed_total_col <- ProteinInfoParsed_total_col[,
!(names(ProteinInfoParsed_total_col) %in% c("x"))]
return(ProteinInfoParsed_total_col)
}
Run this function and then you good to go
ProteinAcc<-"P42293"
GetProteinAnnontate(ProteinAccList,c("gene_names", "protein_name"))
Note you have to use the Returned Field from this link https://www.uniprot.org/help/return_fields
@MohmedSoudy
Could you please consider this modified version of GetProteinAnnontate
in the next update on CRAN?
@AliYoussef96 Sure.
@AliYoussef96 Thank you for the new function - works like a charm! :)
Hi, great package to retrieve information from Uniprot, I find it very useful - thank you for this!
Unfortunately, the GetProteinAnnontate function does not work for me (GetProteinFunction, GetNamesTaxa, GetProteinGOInfo and all others I tested do work).
Here is my code:
ProteinAcc<-"P42293"
GetProteinAnnontate(ProteinAcc,c("entry name", "protein names"))
It produces the following warnings and errors:
Warning: URL 'http://www.uniprot.org/uniprot/?query=accession:P42293&format=tab&columns=entry name': status was 'URL using bad/illegal format or missing URL'Warning: URL 'http://www.uniprot.org/uniprot/?query=accession:P42293&format=tab&columns=entry name': status was 'URL using bad/illegal format or missing URL'Error in file(file, "rt"):
cannot open the connection to 'http://www.uniprot.org/uniprot/?query=accession:P42293&format=tab&columns=entry name'
I tried different formats of column names (both "Legacy Returned Field" and "Returned Field" from this page: https://www.uniprot.org/help/return_fields) but this did not change anything.
I am using UniprotR Version 2.2.1 with R Version 4.2.1 (2022-06-23).
Do you have an advice on how to solve this?