Bioconductor / AnnotationHub

Client for the Bioconductor AnnotationHub web resource
15 stars 12 forks source link

curl::has_internet and proxies #4

Closed wresch closed 6 years ago

wresch commented 6 years ago

We use a proxy for internet access and set environment variables http_proxy and https_proxy. That works for curl and httr to fetch data without any errors (e.g. httr::GET('https://www.google.com') returns a 200 OK). However, curl::has_internet() here

https://github.com/Bioconductor/AnnotationHub/blob/c5c464e21b5203eb0eed9ceb5252cd948eae8fe5/R/AnnotationHub-class.R#L20

still returns FALSE b/c it uses nslookup to resolve a random address which won't work behind the proxy. If i understand this correctly, this means that at the moment it's not possible to use AnnotationHub behind a proxy.

If i manually step around the call to curl::has_internet with

ah <- AnnotationHub::.Hub("AnnotationHub", getAnnotationHubOption("URL"), 
  getAnnotationHubOption("CACHE"), use_proxy(Sys.getenv("http_proxy")),
  FALSE)

the resulting AnnotationHub object works as expected. Maybe AnnotationHub could skip the curl:has_internet() if a proxy is specified and instead functionally test the ability to fetch data?

lshep commented 6 years ago

Thank you for bringing this to our attention. I'll look into this.

wresch commented 6 years ago

4 minutes? Fastest response ever! Thanks a bunch for looking into it. Nice package, btw. Thanks for writing/maintaining it.

lshep commented 6 years ago

If proxy is specified we will skip the check and print out a message. I will follow up with curl folks to open an issue about this as there should be some work around.

lshep commented 6 years ago

@wresch If you don't mind - could you keep an eye on the open github issue and post your findings - I don't have a proxy environment set up so you would have the best most honest feedback and it would be great to have this remedied -
https://github.com/jeroen/curl/issues/153

wresch commented 6 years ago

Installed master - works! Thank you very much.

> library(pacman)
> p_install_gh("Bioconductor/AnnotationHub")
...
> p_version(AnnotationHub)
[1] ‘2.13.1’
> ah <- AnnotationHub(proxy=httr::use_proxy(Sys.getenv("http_proxy")))
Cannot determine internet connection.
 If you experience connection issues consider using 'localHub=TRUE'
updating metadata: retrieving 1 resource
  |======================================================================| 100%

snapshotDate(): 2018-04-23
> query(ah, "OrgDb")
AnnotationHub with 1691 records
# snapshotDate(): 2018-04-23
# $dataprovider: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/
# $species: Escherichia coli, 'Caballeronia concitans', 'Chlorella vulgaris'...
# $rdataclass: OrgDb
# additional mcols(): taxonomyid, genome, description,
#   coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
#   rdatapath, sourceurl, sourcetype
# retrieve records with, e.g., 'object[["AH61768"]]'

            title
  AH61768 | org.Ag.eg.db.sqlite
  AH61769 | org.At.tair.db.sqlite
  AH61770 | org.Bt.eg.db.sqlite
  AH61771 | org.Cf.eg.db.sqlite
  AH61772 | org.Gg.eg.db.sqlite
  ...       ...
  AH63468 | org.Salmonella_typhimurium_LT2.eg.sqlite
  AH63469 | org.Acinetobacter_baumannii.eg.sqlite
  AH63470 | org.Acinetobacter_genomosp._2.eg.sqlite
  AH63471 | org.Acinetobacter_genomospecies_2.eg.sqlite
  AH63472 | org.Bacterium_anitratum.eg.sqlite