DOI-USGS / dataRetrieval

This R package is designed to obtain USGS or EPA water quality sample data, streamflow data, and metadata directly from web services.
https://doi-usgs.github.io/dataRetrieval/
Other
259 stars 84 forks source link

Documenting workaround when curl::has_internet returns FALSE #585

Closed mps9506 closed 2 years ago

mps9506 commented 2 years ago

Hi,

I recently started running into an issue with readWQPdata() and related returning No internet connection., traced back to: https://github.com/USGS-R/dataRetrieval/blob/b60c41a2b5a36e60a5d08bfeabc974dfc4bf83a0/R/getWebServiceData.R#L25-L28 In my case curl::has_internet() returns FALSE even though I had internet access (interestingly, readWQPdata() works in a fresh R session, but didn't work in a targets project).

I have zero understanding of the inner workings of curl or networking, but this is clearly related to some combination of curl and our organization's network configs. This StackOverflow thread has two possible user solutions.

I don't think this issue requires changes to the package, but it might be worth documenting the work around somewhere assuming others might run into the problem. Either of the two functions below can be run in the current session so that curl::has_internet returns TRUE. Obviously this doesn't work if there is actually isn't internet.

assign("has_internet_via_proxy", TRUE, environment(curl::has_internet))

or

remove_has_internet <- function()
{
  unlockBinding(sym = "has_internet", asNamespace("curl"))
  assign("has_internet", function() return(TRUE), envir = asNamespace("curl"))
  lockBinding(sym = "has_internet", asNamespace("curl"))
}
remove_has_internet()
ldecicco-USGS commented 2 years ago

Ug, that sounds frustrating! Looks like you have a workaround for now. I could take this out - if it's a problem with pipelines or other proxy's that's going to be a problem quickly for others. CRAN has some very strict rules on "failing gracefully" if there's no internet, so this was one way that was recommended. I've got other things in place too, so this is redundant.

mps9506 commented 2 years ago

I'm revisiting this because I just got dinged by CRAN for not failing gracefully on one of my own packages. Looking deeper at curl::has_internet(), it should work but I don't know the specifics of our institution's IT setup. However, the desired behavior is pretty easy to mimic with curl::nslookup() and can specify any desired hostname: https://github.com/jeroen/curl/pull/221#issuecomment-615192255

A little helper function:

library(curl)
has_internet_2 <- function(host) {
  !is.null(nslookup(host, error = FALSE))
}

## this should return TRUE if user is online
has_internet_2(host = "waterdata.usgs.gov")

[1] TRUE

As mentioned, I don't have much (any) knowledge on these connectivity functions or network engineering so I'm not sure of the side effects of this approach. The nslookup() help file indicates this should work on all platforms.

ldecicco-USGS commented 2 years ago

I tried implementing the function...from my understanding, it should be OK. Give it a try:

remotes::install_github("USGS-R/dataRetrieval")

If it seems to work for you, let me know and we can close this issue. It will get out on the next round of updates, not sure when that will be.

mps9506 commented 2 years ago

It works! Thank you for incorporating this!