DOI-USGS / dataRetrieval

This R package is designed to obtain USGS or EPA water quality sample data, streamflow data, and metadata directly from web services.
https://doi-usgs.github.io/dataRetrieval/
Other
262 stars 85 forks source link

Couldn't connect to server - proxy question #270

Closed dpphat closed 8 years ago

dpphat commented 8 years ago

This is probably a simple error on my part but I can't seem to debug. I am using R version 3.3.1 and dataRetrieval 2.5.10 on Windows 7 64bit operating system. I am attempting to load sites located within a bounding box:

library(dataRetrieval) sites <- whatNWISsites(bBox = c(-91.6000677, 31.540344, -91.209270, 31.699114))

and I get an error stating that I could not connect to the server.

Error in curl::curl_fetch_memory(url, handle = handle) : Couldn't connect to server

Is there a key that is required to perform this operation? Thanks so much. Dan Puddephatt

ldecicco-USGS commented 8 years ago

Hmm....that just worked for me. 3 possibilities I think

  1. Your internet was momentarily down.
  2. USGS's servers were momentarily down.
  3. You are using the latest dataRetrieval, but some dependency of dataRetrieval is out-of-date.

For 1-2, I'd just re-try. For 3, I'd update all of my packages. If you are using RStudio, there's a button that will update all packages:

update

dpphat commented 8 years ago

Thank you so much for your reply. I have retried over the course of several days without success.

Strange. I just updated my packages using a script I have (I don't use R-studio).

setwd("C:\Program Files\R")

PATH TO ALL R INSTALLATIONS

Rvers <- list.dirs(path = ".", full.names = FALSE, recursive = FALSE)[1]

NAME OF OLDEST R INSTALLATION

libpath <- paste(Rvers, "\library", sep = "") libs <- list.dirs(path = libpath, full.names = FALSE, recursive = FALSE) install.packages(libs)

I checked the dependent packages: XML, readr, httr, curl, reshape2, lubridate, dplyr were all installed, unpackaged, and MD5 sums checked. stats and utils are base packages. I do also have xtable, knitr, and testthat installed.

I then run through the first example from the vignette: siteNumber <- "01491000" ChoptankInfo <- readNWISsite(siteNumber)

and I get the same error: couldn't connect to server. I don't work with USGS. Is it possible that there is a key that I would need? Thanks so much for your time.

On Tue, Aug 16, 2016 at 11:40 AM, Laura DeCicco notifications@github.com wrote:

Hmm....that just worked for me. 3 possibilities I think

  1. Your internet was momentarily down.
  2. USGS's servers were momentarily down.
  3. You are using the latest dataRetrieval, but some dependency of dataRetrieval is out-of-date.

For 1-2, I'd just re-try. For 3, I'd update all of my packages. If you are using RStudio, there's a button that will update all packages:

[image: update] https://cloud.githubusercontent.com/assets/1105215/17705399/ceddd27e-639d-11e6-8765-036513879268.png

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/USGS-R/dataRetrieval/issues/270#issuecomment-240142632, or mute the thread https://github.com/notifications/unsubscribe-auth/ATfnDdlk9MvnrHa6Yni6I0GxpSraGc5Zks5qgdoDgaJpZM4JleO9 .

ldecicco-USGS commented 8 years ago

There's nothing specific non-USGS people do to get data. My guess is that either the httr or readr need to be updated (?)...or some dependency of a dependency (the trickiest problems to figure out).

One thing I'm unclear on is the line:

Rvers <- list.dirs(path = ".", full.names = FALSE, recursive = FALSE)[1]

is only getting the path to the oldest version of R that's been installed on your computer. You could try changing it to:

Rvers <- list.dirs(path = ".", full.names = FALSE, recursive = FALSE)
Rvers <- Rvers[length(Rvers)]

Now, on my computer, that's R-devel which I'm not always using.

With a quick google search, here's another way to possibly update all packages without using RStudio. I have not tried either....

update.packages(checkbuilt = TRUE)

or

lib <- .libPaths()[1]
install.packages( 
    lib  = lib,
    pkgs = as.data.frame(installed.packages(lib), stringsAsFactors=FALSE)$Package,
    type = 'source'
)

(side note, there are certain things that RStudio does great, updating packages is one of them...you might consider if you can to install it just for a few side-tasks).

If you are still having problems, could you paste the output of:

sessionInfo()
ldecicco-USGS commented 8 years ago

I'm also wondering if there's a firewall issue on your end. Can you see this page:

http://waterservices.usgs.gov/nwis/site/?siteOutput=Expanded&format=rdb&site=01491000

dpphat commented 8 years ago

I can see the page. Thanks for your follow-up.

On Tue, Aug 16, 2016 at 3:17 PM, Laura DeCicco notifications@github.com wrote:

I'm also wondering if there's a firewall issue on your end. Can you see this page:

http://waterservices.usgs.gov/nwis/site/?siteOutput= Expanded&format=rdb&site=01491000

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/USGS-R/dataRetrieval/issues/270#issuecomment-240208257, or mute the thread https://github.com/notifications/unsubscribe-auth/ATfnDZ_G17a4xnqU6hSALkni3XYythtAks5qggy6gaJpZM4JleO9 .

dpphat commented 8 years ago

Thanks so much for your assistance. I have attached the session info.

library(dataRetrieval) Warning message: package ‘dataRetrieval’ was built under R version 3.3.1 siteNumber <- "01491000" ChoptankInfo <- readNWISsite(siteNumber) Error in curl::curl_fetch_memory(url, handle = handle) : Couldn't connect to server sessionInfo() R version 3.3.0 (2016-05-03) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 7 x64 (build 7601) Service Pack 1

locale: [1] LC_COLLATE=English_United States.1252 [2] LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C [5] LC_TIME=English_United States.1252

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] dataRetrieval_2.5.10

loaded via a namespace (and not attached): [1] Rcpp_0.12.6 lubridate_1.5.6 XML_3.98-1.4 dplyr_0.5.0 [5] assertthat_0.1 R6_2.1.2 plyr_1.8.4 DBI_0.5 [9] magrittr_1.5 httr_1.2.1 stringi_1.1.1 curl_1.2 [13] reshape2_1.4.1 tools_3.3.0 stringr_1.0.0 readr_1.0.0 [17] tibble_1.1

On Tue, Aug 16, 2016 at 3:03 PM, Laura DeCicco notifications@github.com wrote:

There's nothing specific non-USGS people do to get data. My guess is that either the httr or readr need to be updated (?)...or some dependency of a dependency (the trickiest problems to figure out).

One thing I'm unclear on is the line:

Rvers <- list.dirs(path = ".", full.names = FALSE, recursive = FALSE)[1]

is only getting the path to the oldest version of R that's been installed on your computer. You could try changing it to:

Rvers <- list.dirs(path = ".", full.names = FALSE, recursive = FALSE) Rvers <- Rvers[length(Rvers)]

Now, on my computer, that's R-devel which I'm not always using.

With a quick google search, here's another way to possibly update all packages without using RStudio. I have not tried either....

update.packages(checkbuilt = TRUE)

or

lib <- .libPaths()[1] install.packages( lib = lib, pkgs = as.data.frame(installed.packages(lib), stringsAsFactors=FALSE)$Package, type = 'source' )

(side note, there are certain things that RStudio does great, updating packages is one of them...you might consider if you can to install it just for a few side-tasks).

If you are still having problems, could you paste the output of:

sessionInfo()

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/USGS-R/dataRetrieval/issues/270#issuecomment-240204543, or mute the thread https://github.com/notifications/unsubscribe-auth/ATfnDWtzWo2XFXXrK120umMt-54MGSa-ks5qggmYgaJpZM4JleO9 .

dpphat commented 8 years ago

on the other hand...if I try to do it through R (i.e., httr::GET(url = " http://waterservices.usgs.gov/nwis/site/?siteOutput= Expanded&format=rdb&site=01491000") it get the error.

On Tue, Aug 16, 2016 at 3:17 PM, Laura DeCicco notifications@github.com wrote:

I'm also wondering if there's a firewall issue on your end. Can you see this page:

http://waterservices.usgs.gov/nwis/site/?siteOutput= Expanded&format=rdb&site=01491000

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/USGS-R/dataRetrieval/issues/270#issuecomment-240208257, or mute the thread https://github.com/notifications/unsubscribe-auth/ATfnDZ_G17a4xnqU6hSALkni3XYythtAks5qggy6gaJpZM4JleO9 .

ldecicco-USGS commented 8 years ago

yeah, that's essentially all that's going on...let me check on a few things...

dpphat commented 8 years ago

Thanks so much Laura.

On Tue, Aug 16, 2016 at 4:03 PM, Laura DeCicco notifications@github.com wrote:

yeah, that's essentially all that's going on...let me check on a few things...

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/USGS-R/dataRetrieval/issues/270#issuecomment-240220634, or mute the thread https://github.com/notifications/unsubscribe-auth/ATfnDS9a3onyYYJY8KYu1GAmXbuJXYKGks5qgheYgaJpZM4JleO9 .

dpphat commented 8 years ago

readLines returns a tab separated vector of character strings.

readLines(" http://waterservices.usgs.gov/nwis/site/?siteOutput=Expanded&format=rdb&site=01491000 ") [1] "#"

[2] "#"

[3] "# US Geological Survey"

[4] "# retrieved: 2016-08-16 16:16:34 -04:00\t(caas01)"

[5] "#"

[6] "# The Site File stores location and general information about groundwater,"

[7] "# surface water, and meteorological sites"

[8] "# for sites in USA."

[9] "#"

[10] "# File-format description: http://help.waterdata.usgs.gov/faq/about-tab-delimited-output"

[11] "# Automated-retrieval info: http://waterservices.usgs.gov/rest/Site-Service.html"

[12] "#"

[13] "# Contact: gs-w_support_nwisweb@usgs.gov"

[14] "#"

[15] "# The following selected fields are included in this output:"

[16] "#"

[17] "# agency_cd -- Agency"

[18] "# site_no -- Site identification number"

[19] "# station_nm -- Site name"

[20] "# site_tp_cd -- Site type"

[21] "# lat_va -- DMS latitude"

[22] "# long_va -- DMS longitude"

[23] "# dec_lat_va -- Decimal latitude"

[24] "# dec_long_va -- Decimal longitude"

[25] "# coord_meth_cd -- Latitude-longitude method"

[26] "# coord_acy_cd -- Latitude-longitude accuracy"

[27] "# coord_datum_cd -- Latitude-longitude datum"

[28] "# dec_coord_datum_cd -- Decimal Latitude-longitude datum"

[29] "# district_cd -- District code"

[30] "# state_cd -- State code"

[31] "# county_cd -- County code"

[32] "# country_cd -- Country code"

[33] "# land_net_ds -- Land net location description"

[34] "# map_nm -- Name of location map"

[35] "# map_scale_fc -- Scale of location map"

[36] "# alt_va -- Altitude of Gage/land surface"

[37] "# alt_meth_cd -- Method altitude determined"

[38] "# alt_acy_va -- Altitude accuracy"

[39] "# alt_datum_cd -- Altitude datum"

[40] "# huc_cd -- Hydrologic unit code"

[41] "# basin_cd -- Drainage basin code"

[42] "# topo_cd -- Topographic setting code"

[43] "# instruments_cd -- Flags for instruments at site"

[44] "# construction_dt -- Date of first construction"

[45] "# inventory_dt -- Date site established or inventoried"

[46] "# drain_area_va -- Drainage area"

[47] "# contrib_drain_area_va -- Contributing drainage area"

[48] "# tz_cd -- Time Zone abbreviation"

[49] "# local_time_fg -- Site honors Daylight Savings Time"

[50] "# reliability_cd -- Data reliability code"

[51] "# gw_file_cd -- Data-other GW files"

[52] "# nat_aqfr_cd -- National aquifer code"

[53] "# aqfr_cd -- Local aquifer code"

[54] "# aqfr_type_cd -- Local aquifer type code"

[55] "# well_depth_va -- Well depth"

[56] "# hole_depth_va -- Hole depth"

[57] "# depth_src_cd -- Source of depth data"

[58] "# project_no -- Project number"

[59] "#"

[60] "agency_cd\tsite_no\tstation_nm\tsite_tp_cd\tlat_va\tlong_va\tdec_lat_va\tdec_long_va\tcoord_meth_cd\tcoord_acy_cd\tcoord_datum_cd\tdec_coord_datum_cd\tdistrict_cd\tstate_cd\tcounty_cd\tcountry_cd\tland_net_ds\tmap_nm\tmap_scale_fc\talt_va\talt_meth_cd\talt_acy_va\talt_datum_cd\thuc_cd\tbasin_cd\ttopo_cd\tinstruments_cd\tconstruction_dt\tinventory_dt\tdrain_area_va\tcontrib_drain_area_va\ttz_cd\tlocal_time_fg\treliability_cd\tgw_file_cd\tnat_aqfr_cd\taqfr_cd\taqfr_type_cd\twell_depth_va\thole_depth_va\tdepth_src_cd\tproject_no" [61] "5s\t15s\t50s\t7s\t16s\t16s\t16s\t16s\t1s\t1s\t10s\t10s\t3s\t2s\t3s\t2s\t23s\t20s\t7s\t8s\t1s\t3s\t10s\t16s\t2s\t1s\t30s\t8s\t8s\t8s\t8s\t6s\t1s\t1s\t30s\t10s\t8s\t1s\t8s\t8s\t1s\t12s"

[62] "USGS\t01491000\tCHOPTANK RIVER NEAR GREENSBORO, MD\tST\t385949.9\t0754708.9\t38.99719444\t-75.7858056\tM\tS\tNAD83\tNAD83\t24\t24\t011\tUS\t\t\t\t 2.73\tN\t .1\tNAVD88\t02060005\t\t\tYYNNYNYNNNYNNNNNNNNNNNNNNNNNNN\t\t\t113\t\tEST\tN\t\tNNNNNNNN\t\t\t\t\t\t\t442400300"

Not sure if that helps. It seems like changing the read method would create much more work.

On Tue, Aug 16, 2016 at 4:04 PM, dan puddephatt dpuddeph@gmail.com wrote:

Thanks so much Laura.

On Tue, Aug 16, 2016 at 4:03 PM, Laura DeCicco notifications@github.com wrote:

yeah, that's essentially all that's going on...let me check on a few things...

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/USGS-R/dataRetrieval/issues/270#issuecomment-240220634, or mute the thread https://github.com/notifications/unsubscribe-auth/ATfnDS9a3onyYYJY8KYu1GAmXbuJXYKGks5qgheYgaJpZM4JleO9 .

ldecicco-USGS commented 8 years ago

When I paste your httr code, I get the same error, but I think that's because you have some spaces in there:

httr::GET(url = "
http://waterservices.usgs.gov/nwis/site/?siteOutput=
Expanded&format=rdb&site=01491000")
Error in curl::curl_fetch_memory(url, handle = handle) : 
  URL using bad/illegal format or missing URL

If I clean that up a bit, I get:

httr::GET(url = "http://waterservices.usgs.gov/nwis/site/?siteOutput=Expanded&format=rdb&site=01491000")
Response [http://waterservices.usgs.gov/nwis/site/?siteOutput=Expanded&format=rdb&site=01491000]
  Date: 2016-08-16 20:29
  Status: 200
  Content-Type: text/plain;charset=UTF-8
  Size: 3.16 kB
#
#
# US Geological Survey
# retrieved: 2016-08-16 16:29:31 -04:00 (vaas01)
#
# The Site File stores location and general information about groundw...
# surface water, and meteorological sites
# for sites in USA.
#
# File-format description:  http://help.waterdata.usgs.gov/faq/about-...
...

So it's not an httr issue like I was hoping. But, just to check, if you copy this:

httr::GET(url = "http://waterservices.usgs.gov/nwis/site/?siteOutput=Expanded&format=rdb&site=01491000")

do you get an error?

dpphat commented 8 years ago

Yeah. Same error.

httr::GET(url = " http://waterservices.usgs.gov/nwis/site/?siteOutput=Expanded&format=rdb&site=01491000 ") Error in curl::curl_fetch_memory(url, handle = handle) : Couldn't connect to server

Since I am able to access it using readLines that suggests it isn't a firewall issue either.

On Tue, Aug 16, 2016 at 4:32 PM, Laura DeCicco notifications@github.com wrote:

When I paste your httr code, I get the same error, but I think that's because you have some spaces in there:

httr::GET(url = "http://waterservices.usgs.gov/nwis/site/?siteOutput= Expanded&format=rdb&site=01491000 http://waterservices.usgs.gov/nwis/site/?siteOutput=Expanded&format=rdb&site=01491000") Error in curl::curl_fetch_memory(url, handle = handle) : URL using bad/illegal format or missing URL

If I clean that up a bit, I get:

httr::GET(url = "http://waterservices.usgs.gov/nwis/site/?siteOutput=Expanded&format=rdb&site=01491000") Response [http://waterservices.usgs.gov/nwis/site/?siteOutput=Expanded&format=rdb&site=01491000] Date: 2016-08-16 20:29 Status: 200 Content-Type: text/plain;charset=UTF-8 Size: 3.16 kB # #

US Geological Survey

retrieved: 2016-08-16 16:29:31 -04:00 (vaas01)

#

The Site File stores location and general information about groundw...

surface water, and meteorological sites

for sites in USA.

#

File-format description: http://help.waterdata.usgs.gov/faq/about-...

...

So it's not an httr issue like I was hoping. But, just to check, if you copy this:

httr::GET(url = "http://waterservices.usgs.gov/nwis/site/?siteOutput=Expanded&format=rdb&site=01491000")

do you get an error?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/USGS-R/dataRetrieval/issues/270#issuecomment-240228815, or mute the thread https://github.com/notifications/unsubscribe-auth/ATfnDd8y736FAXOpAjSrpqqAtMt_INu7ks5qgh5IgaJpZM4JleO9 .

lawinslow commented 8 years ago

@dpphat are you on a work network and/or using any sort of proxy server?

dpphat commented 8 years ago

Hi Luke. I am on a work network. It looks like there is a configuration script (over my head) that may set the proxy. [image: Inline image 1]

On Tue, Aug 16, 2016 at 4:44 PM, Luke Winslow notifications@github.com wrote:

@dpphat https://github.com/dpphat are you on a work network and/or using any sort of proxy?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/USGS-R/dataRetrieval/issues/270#issuecomment-240232478, or mute the thread https://github.com/notifications/unsubscribe-auth/ATfnDbE1DZ8YG6ASQBcvooYRBILp6YSgks5qgiE5gaJpZM4JleO9 .

lawinslow commented 8 years ago

Hmm, image didn't come through. But either way, you'll have to setup the R client to use the proxy. It should hopefully be somewhat straight forward. I don't have a proxy to test this against, but...

This should hopefully get you your proxy info. If not, the config script should. curl::ie_proxy_info()

Then, you need to setup httr to use that proxy. There is a command use_proxy that needs to be fed into set_config.

library(httr)
set_config(use_proxy(url="abc.com",port=8080, username="username", password="password"))

Note: Username and pass may be optional.

That, hopefully should fix your issue. But keep in mind, only for the life of your R session. You'd need to re-run when you restart R or put it into an .Renviron file so it runs every time on startup.

lawinslow commented 8 years ago

Also, you may want to talk to tech support about your proxy configuration. They may have more info.

dpphat commented 8 years ago

Thanks Luke and Laura. my proxy info was contained within a script. I was able to look into that script to find it and add it to an Renviron.site file in the /etc folder. I was able to restart R and dataRetrieval is working beautifully. Thanks so much for you help today. Dan Puddephatt.

On Tue, Aug 16, 2016 at 5:08 PM, Luke Winslow notifications@github.com wrote:

Hmm, image didn't come through. But either way, you'll have to setup the R client to use the proxy. It should hopefully be somewhat straight forward. I don't have a proxy to test this against, but...

This should hopefully get you your proxy info. If not, the config script should. curl::ie_proxy_info()

Then, you need to setup httr to use that proxy. There is a command use_proxy that needs to be fed into set_config.

library(httr) set_config(use_proxy(url="abc.com",port=8080, username="username", password="password"))

Note: Username and pass may be optional.

That, hopefully should fix your issue. But keep in mind, only for the life of your R session. You'd need to re-run when you restart R or put it into an .Renviron file so it runs every time on startup.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/USGS-R/dataRetrieval/issues/270#issuecomment-240239522, or mute the thread https://github.com/notifications/unsubscribe-auth/ATfnDSB_86UPt4SaV7l7ilSmKrBoAFgyks5qgibOgaJpZM4JleO9 .