Proteomicslab57357 / UniprotR

Retrieving Information of Proteins from Uniprot
GNU General Public License v3.0
59 stars 18 forks source link

Cannot download Uniprot FASTA file #6

Closed richelbilderbeek closed 4 years ago

richelbilderbeek commented 4 years ago

Dear UniprotR maintainer,

When I try to download a reference proteome:

UniprotR:::GetProteomeFasta("UP000464024")

I get this error:

Error in curl::curl_fetch_memory(url, handle = handle) : 
  error:14094410:SSL routines:ssl3_read_bytes:sslv3 alert handshake failure
Calls: <Anonymous> ... request_fetch -> request_fetch.write_memory -> <Anonymous>
Execution halted

This error is -as far as I understand- caused by Uniprot using an outdated SSL system.

I suggest that either UniprotR should work around this, and/or mentions this problem (preferable with a fix of course :+1:).

This is my session info:

R version 3.6.3 (2020-02-29)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=nl_NL.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=nl_NL.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=nl_NL.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=nl_NL.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
[1] compiler_3.6.3
richelbilderbeek commented 4 years ago

Note that sometimes ...

UniprotR:::GetSequences("UP000464024")

you also get this error:

Error in curl::curl_fetch_memory(url, handle = handle) : 
  Timeout was reached: [www.uniprot.org] Resolving timed out after 10000 milliseconds

They are related, so I just put it here :+1:

richelbilderbeek commented 4 years ago

What worked for me is to use RCurl:

GetProteomeFasta <- function(ProteomeID, directorypath = NULL)
{
  baseUrl <- "https://www.uniprot.org/uniprot/?query=proteome:"
  fullUrl <- paste0(baseUrl , ProteomeID,"&format=fasta")
  text <- RCurl::getURL(fullUrl)
  filename <- paste0(ProteomeID, ".fasta")
  if (!is.null(directorypath)) {
    filename <- file.path(directorypath, ProteomeID, ".fasta")
  }
  writeLines(text, filename)
}

I submitted no Pull Request here, because there are not tests, not are these results consistent.

richelbilderbeek commented 4 years ago

(note to self: one can do it from the command line using

wget --no-check-certificate --auth-no-challenge --output-document=UP000464024.fasta "https://www.uniprot.org/uniprot/?query=proteome:UP000464024&format=fasta"

)

MohmedSoudy commented 4 years ago

Thanks for your great efforts we are going to change it, in the next version.

richelbilderbeek commented 4 years ago

Thanks for accepting :+1: