DOI-USGS / dataRetrieval

This R package is designed to obtain USGS or EPA water quality sample data, streamflow data, and metadata directly from web services.
https://doi-usgs.github.io/dataRetrieval/
Other
260 stars 84 forks source link

Error: '' does not exist in current working directory #616

Closed lindsayplatt closed 2 years ago

lindsayplatt commented 2 years ago

Describe the bug I am doing a big nationwide pull and am using grids across the USA + territories to organize the pull into chunks. I was able to run the pull for each of these grids without issue, except for two of them over in the Aleutian Islands. I imagine it is because there is not any data, but that is the case for some of the other grids over in those islands and they just return an empty df. Rather than the current error, I'd love if dataRetrieval could return either a more helpful error or just return an empty dataset.

To Reproduce Steps to reproduce the behavior:

library(dataRetrieval)

# Has data
bbox_6117 <- c(xmin = 172.768914, ymin = 51.398187, 
               xmax = 174.768914, ymax = 53.398187)
bbox_6119 <- c(xmin = 176.768914, ymin = 51.398187, 
               xmax = 178.768914, ymax = 53.398187)

# No data but no failure
bbox_6116 <- c(xmin = 170.768914, ymin = 51.398187, 
               xmax = 172.768914, ymax = 53.398187)
bbox_6118 <- c(xmin = 174.768914, ymin = 51.398187, 
               xmax = 176.768914, ymax = 53.398187)

# Fails
bbox_6120 <- c(xmin = 178.768914, ymin = 51.398187, 
               xmax = 180.768914, ymax = 53.398187)
bbox_5940 <- c(xmin = 178.768914, ymin = 49.398187, 
               xmax = 180.768914, ymax = 51.398187)

empty_query <- dataRetrieval::whatWQPdata(
  sampleMedia = c("Water", "water"), 
  siteType = "Stream", 
  bBox = bbox_6116
)

data_query <- dataRetrieval::whatWQPdata(
  sampleMedia = c("Water", "water"), 
  siteType = "Stream", 
  bBox = bbox_6117
)

problem_query <- dataRetrieval::whatWQPdata(
  sampleMedia = c("Water", "water"), 
  siteType = "Stream", 
  bBox = bbox_6120
)

Expected behavior I would expect no data to be returned, not a failure.

Screenshots

The image below shows the two cells that are failing and how they spatially relate to the others. image

I tested the cells around them and those don't fail (blue = no data but no fail, returned empty df; green = data available) image

My environment after running the code above: image

Console with error message: image

Session Info Please include your session info:

> devtools::session_info()
- Session info  ------------------------------------
 hash: raccoon, sport utility vehicle, middle finger: medium skin tone

 setting  value
 version  R version 4.1.1 (2021-08-10)
 os       Windows 10 x64 (build 19042)
 system   x86_64, mingw32
 ui       RStudio
 language (EN)
 collate  English_United States.1252
 ctype    English_United States.1252
 tz       America/Chicago
 date     2022-05-17
 rstudio  1.4.1717 Juliet Rose (desktop)
 pandoc   NA

- Packages -----------------------------------------
 package       * version date (UTC) lib source
 assertthat      0.2.1   2019-03-21 [1] CRAN (R 4.1.1)
 cachem          1.0.6   2021-08-19 [1] CRAN (R 4.1.1)
 callr           3.7.0   2021-04-20 [1] CRAN (R 4.1.1)
 class           7.3-19  2021-05-03 [2] CRAN (R 4.1.1)
 classInt        0.4-3   2020-04-07 [1] CRAN (R 4.1.1)
 cli             3.1.0   2021-10-27 [1] CRAN (R 4.1.2)
 crayon          1.4.2   2021-10-29 [1] CRAN (R 4.1.2)
 curl            4.3.2   2021-06-23 [1] CRAN (R 4.1.1)
 dataRetrieval * 2.7.11  2022-02-18 [1] CRAN (R 4.1.3)
 DBI             1.1.1   2021-01-15 [1] CRAN (R 4.1.1)
 desc            1.4.0   2021-09-28 [1] CRAN (R 4.1.2)
 devtools        2.4.3   2021-11-30 [1] CRAN (R 4.1.1)
 dplyr           1.0.7   2021-06-18 [1] CRAN (R 4.1.1)
 e1071           1.7-9   2021-09-16 [1] CRAN (R 4.1.1)
 ellipsis        0.3.2   2021-04-29 [1] CRAN (R 4.1.1)
 fansi           0.5.0   2021-05-25 [1] CRAN (R 4.1.1)
 fastmap         1.1.0   2021-01-25 [1] CRAN (R 4.1.1)
 fs              1.5.1   2021-11-30 [1] CRAN (R 4.1.1)
 generics        0.1.1   2021-10-25 [1] CRAN (R 4.1.2)
 glue            1.6.0   2021-12-17 [1] CRAN (R 4.1.2)
 httr            1.4.2   2020-07-20 [1] CRAN (R 4.1.1)
 jsonlite        1.7.2   2020-12-09 [1] CRAN (R 4.1.1)
 KernSmooth      2.23-20 2021-05-03 [2] CRAN (R 4.1.1)
 lifecycle       1.0.1   2021-09-24 [1] CRAN (R 4.1.2)
 magrittr        2.0.1   2020-11-17 [1] CRAN (R 4.1.1)
 memoise         2.0.1   2021-11-26 [1] CRAN (R 4.1.2)
 pillar          1.6.4   2021-10-18 [1] CRAN (R 4.1.2)
 pkgbuild        1.2.1   2021-11-30 [1] CRAN (R 4.1.1)
 pkgconfig       2.0.3   2019-09-22 [1] CRAN (R 4.1.1)
 pkgload         1.2.4   2021-11-30 [1] CRAN (R 4.1.1)
 prettyunits     1.1.1   2020-01-24 [1] CRAN (R 4.1.1)
 processx        3.5.2   2021-04-30 [1] CRAN (R 4.1.1)
 proxy           0.4-26  2021-06-07 [1] CRAN (R 4.1.1)
 ps              1.6.0   2021-02-28 [1] CRAN (R 4.1.1)
 purrr           0.3.4   2020-04-17 [1] CRAN (R 4.1.1)
 R6              2.5.1   2021-08-19 [1] CRAN (R 4.1.1)
 Rcpp            1.0.7   2021-07-07 [1] CRAN (R 4.1.1)
 remotes         2.4.2   2021-11-30 [1] CRAN (R 4.1.1)
 rlang           0.4.12  2021-10-18 [1] CRAN (R 4.1.2)
 rprojroot       2.0.2   2020-11-15 [1] CRAN (R 4.1.1)
 rstudioapi      0.13    2020-11-12 [1] CRAN (R 4.1.1)
 sessioninfo     1.2.1   2021-11-02 [1] CRAN (R 4.1.2)
 sf              1.0-4   2021-11-14 [1] CRAN (R 4.1.2)
 testthat        3.1.0   2021-10-04 [1] CRAN (R 4.1.2)
 tibble          3.1.6   2021-11-07 [1] CRAN (R 4.1.2)
 tidyselect      1.1.1   2021-04-30 [1] CRAN (R 4.1.1)
 units           0.7-2   2021-06-08 [1] CRAN (R 4.1.1)
 usethis         2.1.3   2021-10-27 [1] CRAN (R 4.1.2)
 utf8            1.2.2   2021-07-24 [1] CRAN (R 4.1.1)
 vctrs           0.3.8   2021-04-29 [1] CRAN (R 4.1.1)
 withr           2.4.3   2021-11-30 [1] CRAN (R 4.1.1)
 xml2            1.3.3   2021-11-30 [1] CRAN (R 4.1.1)

 [1] C:/Users/lcarr/Documents/R/win-library/4.1
 [2] C:/Program Files/R/R-4.1.1/library

----------------------------------------------------

Additional context I will code in a workaround for now with a tryCatch(), so I am not blocked by this behavior. I did also make sure that I have the most up-to-date version of dataRetrieval before logging this issue.

ldecicco-USGS commented 2 years ago

The problem queries are coming back with a 400 error from the server. The message that comes back from the server is: "299 WQP \"The value of bBox=178.768914,51.398187,180.768914,53.398187 is not a valid bounding box.\"

The requirements for bbox in WQP are: "Western-most longitude, Southern-most latitude, Eastern-most longitude, and Northern-most longitude separated by commas,expressed in decimal degrees, WGS84, and longitudes west of Greenwich are negative. (Example: bBox=-92.8,44.2,-88.9,46.0)"

I'm guessing the problem is because the west-east coordinates cross 180. I tried fiddling with coordinates, like:

bbox_6120_neg <- c(xmin = 178.768914, ymin = 51.398187, 
                                  xmax = -179.2311, ymax = 53.398187)

but that gave the same error.

I'll get in touch with the WQP developers and see if there are additional restrictions on the WQP bbox argument.

I'll fiddle with our output so that the message from WQP gets displayed to the user. I think it will still need to be an error though - because that particular error will need to be displayed if a user inputs a wrong bbox as defined by the WQP. Right now the error message isn't displayed because it's assuming the same structure as what NWIS uses.

jordansread commented 2 years ago
bbox_6120_neg <- c(xmin = -179.2311, ymin = 51.398187, 
                   xmax = -178.768914, ymax = 53.398187)

works (flipping your min max for x)

ldecicco-USGS commented 2 years ago

but I don't think that's the same (? maybe? IDK...) The coordinates we want are:

xmin = 178.768914
xmax = 180.768914

I think the 178 needs to stay positive, and the 180.768 should be changed to 180.768-360 = -179.2311.

What you have doesn't cross 180. But 🤷‍♀️ I would not be surprised to find I'm missing something?

lindsayplatt commented 2 years ago

Hmmmm the following returns data at

# Returns one site at 179.1987 long
bbox_5940_fix <- 
  c(xmin = 178.768914, ymin = 49.398187,
    xmax = -(180.768914-360), ymax = 51.398187)

dataRetrieval::whatWQPdata(
  sampleMedia = c("Water", "water"), 
  siteType = "Stream", 
  bBox = bbox_5940_fix
)

                       total_type      lat      lon ProviderName OrganizationIdentifier           OrganizationFormalName
1 FeatureCollection Feature Point 51.39481 179.1987         NWIS                USGS-AK USGS Alaska Water Science Center
  MonitoringLocationIdentifier           MonitoringLocationName MonitoringLocationTypeName ResolvedMonitoringLocationTypeName
1         USGS-512347179120570 STREAM (95-53) ON AMCHITKA IS AK                     Stream                             Stream
  HUCEightDigitCode                                                                     siteUrl activityCount resultCount
1          19030103 https://www.waterqualitydata.us/provider/NWIS/USGS-AK/USGS-512347179120570/             1          38
  StateName                 CountyName
1    Alaska Aleutians West Census Area
ldecicco-USGS commented 2 years ago

-(180.768914-360) is (positive) 179.2311... So that query is from 178 to 179, but you wanted 178 to 180.7. So, you got 1 site, but there might be more.

I think you want 180.768914-360 (so, -179.2311)

jordansread commented 2 years ago

Am I correct in saying that it doesn't seem like the service supports 1) values that exceed 180 (or are below -180), 2) paired values that include one negative and one positive number?

So I'd say that Lindsay's original box still isn't supported (Laura is correct that it spans more than the one I mocked up and the one that Lindsay landed on, which is 178.768914 to 179.2311). But now we know this and you can set your boxes up to avoid this issue. One way to do that is to start them on the 180° mark (I'm assuming you are using sf::st_make_grid() and can use offset to do that?

Still think it is worth following up with WQP team to see if this is a little weird bug.

ldecicco-USGS commented 2 years ago

Correct, it appears the query can't cross 180. I've sent a message to ask the WQP devs to confirm, but that seems to be what we're seeing.

So yeah, splitting it into 2 should work.

lindsayplatt commented 2 years ago

~I don't think the offset from sf::st_make_grid() will matter because I would just run into the same issue on the other side since the grid cells are all the same size. Since I will have already captured anything >180 by the grid cells on the opposite side, I think I can just cut them off at 180.~ EDIT: Actually, it would work (see below, where grey is original, red is the shifted).

image

ldecicco-USGS commented 2 years ago

Sounds like the official WQP solution is to split it into 2 bounding boxes.

I will work to make the message from a 400 response more user friendly.

ldecicco-USGS commented 2 years ago

I'm closing this issue so I can distill the improve 400 message task to just what's needed to be done. We did learn from this issue that WQP doesn't want to cross the 180 meridian.