DOI-USGS / dataRetrieval

This R package is designed to obtain USGS or EPA water quality sample data, streamflow data, and metadata directly from web services.
https://doi-usgs.github.io/dataRetrieval/
Other
259 stars 84 forks source link

readNWISpCode("all") not parsing as expected #591

Closed lindsayplatt closed 2 years ago

lindsayplatt commented 2 years ago

Describe the bug readNWISpCode("all") is no longer working. No longer returns a nicely parsed dataset.

To Reproduce I am using version 2.7.10 (but colleagues experienced this on 2.7.1). When you run readNWISpCode("all"), you do not get a nicely parsed table, but instead the start of an HTML webpage:

1                                                                                                                                                                                                                                                 <html xmlns="http://www.w3.org/1999/xhtml" lang="en">
2                                                                                                                                                                                                                                                                                                <head>
3                                                                                                                                                                                                                                 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
4                                                                                                                                                                                        <base href="https://help.waterdata.usgs.gov/codes-and-parameters/codes" /><!--[if lt IE 7]></base><![endif]-->
5         <meta content="Search criteria are criteria that you enter to select sites of interest. Codes describe data and aid in its interpretation. This page contains links to a comprehensive set of codes used by this site which can be used an an authoritative reference." name="description" />
6                                                                                                                       <link rel="stylesheet" type="text/css" media="screen" href="https://help.waterdata.usgs.gov/portal_css/Sunburst%20Theme/reset-cachekey-8d2e968c3efbf6373cda549130a0c279.css" />

Expected behavior I expect to see a table akin to what is returned from parameterCdFile. I instructed colleagues to use parameterCdFile instead for now to the get all the pcode info into a table.

Session Info Using dataRetrieval version 2.7.10.

R version 4.1.1 (2021-08-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)

Additional context Colleagues first told me about this failing yesterday, but I don't know when it worked last to say that yesterday was the first day. Could it be related to the Drupal 9 switch?

ldecicco-USGS commented 2 years ago

Ugg...yeah, that's new. It also doesn't work when the user asks for a pcode that's not in the parameterCdFile. Usually, that would kick off a call to the service. Now you get a similar html result.

bhuffman-usgs commented 2 years ago

I believe the API (nwis.waterdata.usgs.gov) used for retrieving the parm_cd info has changed. The error comes out of the importRDB1 function (called when you use the 'all' option for the parm_cd). The importRDB1 function gets called elsewhere when the parm_cd doesnt exist in the cached/packaged file dataRetrieval is using (parameterCdFile) and I've verified the error persists there as well.

The data you are seeing is the html from the help page because the current fullURL that's used to retrieve all the parm_cd info gets forwarded there. That lead me to believe the fullURL isn't going to work anymore, likely because they've changed the way the API works. I ended up replacing the line using the importRDB1 in the readNWISpCode function with the following: parameterData <- importRDB1("https://help.waterdata.usgs.gov/code/parameter_cd_query?fmt=rdb&group_cd=%", asDateTime = FALSE)

This is a patch-work solution for myself for only the "all" parm_cd option, just as an FYI. The column names will comeback differently than the previous way of retrieving the parm_cd info, so that will have to be accounted for.

bhuffman-usgs commented 2 years ago

Hey Laura, I had some free time to get things functioning again. If you change the pkg.env$pCode to "https://help.waterdata.usgs.gov/code" in the setAccess.R file and replace the readNWISpCode function with the contents of the text file I've attached, everything should be back up and running. pcode_fix.txt

bhuffman-usgs commented 2 years ago

Just a heads up, I checked the setAccess.R file and it has the "/parameter_cd_query" in it. That base path only works for the "all" selection. For the individual pulls it'd need to be "/parameter_cd_nm_query". Thats why I left the base path as "https://help.waterdata.usgs.gov/code" and then appended them individually for those use cases within the readNWISpCode function.

ldecicco-USGS commented 2 years ago

Ah OK, I'll futz more then, the drURL code messes up the spot of the question mark, so I kept getting a 500 with the "https://help.waterdata.usgs.gov/code"

ldecicco-USGS commented 2 years ago

I think the latest pull request took care of the single-pcode issue. Please install, play around with it, and let me know if anything looks amiss.

remotes::install_github("USGS-R/dataRetrieval")
paramINFO <- readNWISpCode(c('00060','01075','00931', NA))

param_all <- readNWISpCode("all")
lindsayplatt commented 2 years ago

This seems to be working now! param_all has 24,200 rows 😮