DOI-USGS / dataRetrieval

This R package is designed to obtain USGS or EPA water quality sample data, streamflow data, and metadata directly from web services.
https://doi-usgs.github.io/dataRetrieval/
Other
259 stars 84 forks source link

WQP updates #609

Closed ldecicco-USGS closed 2 years ago

ldecicco-USGS commented 2 years ago

Should WQP bring all columns in initially as character? Should (does?) WQP add an argument to allow the user to import everything as character? Should dataRetrieval check that WQP queries make sense (ie...2 states and 1 county...what does that mean?...on one hand, WQP already gives some useful error responses, maybe these types of questions should be converted to a ticket to WQP directly)

nrlottig commented 2 years ago

I will add to this that converting everything to character made things simpler for me recently. Also- I have not found an instance were there weren't character values in the result columns (e.g., <5) when you look at enough results. Those values are often important because the likely tell you something about detection limits as well as the reported value (although not the best practice). I suspect the backend treats result values as characters otherwise there could not be be instances where character values exist in a column which is defined as numeric. When processing data, we examine all non-numerics in result value columns and deal with as appropriate given the parameter those values are associated with.

ldecicco-USGS commented 2 years ago

Just to alleviate any fear for the value column, we do bring that column in as characters, and only convert to numeric if every value in that column can be converted to numeric. If not, the column stays character. https://github.com/USGS-R/dataRetrieval/blob/main/R/importWQP.R#L206

The reason for this is a bit historic (trying to remain backwards compatible when possible) and a bit to be helpful (a lot more data is numeric than not). The time when it causes more problems is when people are doing big pulls and binding multiple queries. We figured more of those "power users" would be able to convert those columns to character easier than the non-power users wondering why they can't do stats on their fully numeric data.

The NWIS functions have an argument convertType which if set to FALSE brings in all the columns as character. I think it was just lack of time/resource that we didn't also do that to the WQP functions back when that argument was added.

ldecicco-USGS commented 2 years ago

These were quick notes in a meeting. I think the first and 3rd question are covered in the last PR. I'll make the 2nd a single Issue.