Bioconductor / AnnotationHub

Client for the Bioconductor AnnotationHub web resource
16 stars 13 forks source link

AnnotationHub(): Server access test is too conservative (with suggestions) #20

Closed HenrikBengtsson closed 3 years ago

HenrikBengtsson commented 4 years ago

Summary

We have HPC hosts that have working HTTP/HTTPS proxies, but no support for nslookup. This causes AnnotationHub() to incorrectly believe it has no access to the server. If the internal nslookup test could be skipped, then it would indeed have worked.

Issue

$ R --vanilla
> hub <- AnnotationHub::AnnotationHub()
Cannot connect to AnnotationHub server, using 'localHub=TRUE' instead
/wynton/home/cbi/hb/.cache/AnnotationHub
  does not exist, create directory? (yes/no): no

Troubleshooting

This is because AnnotationHub::AnnotationHub uses:

> curl::nslookup("annotationhub.bioconductor.org")
Error in curl::nslookup("annotationhub.bioconductor.org") : 
  Unable to resolve host: annotationhub.bioconductor.org

to test whether it can connect to that server. However, a non-nslookup connection test shows that it works:

> readLines(curl::curl("https://annotationhub.bioconductor.org"), n = 5L)
[1] "<html>"                               
[2] "<head>"                               
[3] "    <title>BiocHub Server API</title>"
[4] "</head>"                              
[5] "<body>"

Another proof that curl::nslookup() is a too conservative test is to override it's result, e.g.:

trace(AnnotationHub::AnnotationHub, at = 3L, tracer = quote(connect <- TRUE))
# Tracing function "AnnotationHub" in package "AnnotationHub"
# [1] "AnnotationHub"
hub <- AnnotationHub::AnnotationHub()
# Tracing AnnotationHub::AnnotationHub() step 3 
# Testing for internet connectivity via https_proxy... success!
# snapshotDate(): 2020-04-27

Workaround

Call the following first, will workaround the current AnnotationHub() limitation:

AnnotationHub::setAnnotationHubOption("PROXY", Sys.getenv("https_proxy"))

Suggestion

Since it's not "unheard of" that access to nslookup can be restricted on some compute environments, I'd like to suggest to use another approach, e.g. above curl::curl() approach, or something that works like curl --head ... and checks the return status. The latter could even be a fallback to the current curl::nslookup() test.

Even without changing the current approach, it would be neat if one could skip the test and just let it try. One natural approach would be to support:

hub <- AnnotationHub::AnnotationHub(proxy=TRUE)

by updating the code to:

    if (is.null(proxy)) {
        connect <- !is.null(curl::nslookup("annotationhub.bioconductor.org", 
            error = FALSE))
    }
    else if (isTRUE(proxy)) {
        connect <- TRUE
        proxy <- NULL
    else {
        connect <- TRUE
        message("Assuming valid proxy connection through '", 
            ifelse(is(proxy, "request"), paste(unlist(proxy), 
                collapse = ":"), proxy), "'", "\n If you experience connection issues consider ", 
            "using 'localHub=TRUE'")
    }

Session info

> packageVersion("AnnotationHub")
[1] ‘2.20.2’
HenrikBengtsson commented 3 years ago

Thanks for fixing this. I'm confirming that this now works with the Bioc 3.12 release (Oct 2020) on the same system that previously failed;

> BiocManager::version()
[1] ‘3.12’
> packageVersion("AnnotationHub")
[1] ‘2.22.0’
> hub <- AnnotationHub::AnnotationHub()
  |======================================================================| 100%

Testing for internet connectivity via https_proxy... success!
snapshotDate(): 2020-10-27

cc/ @reliscu