USEPA / EPATADA

This R package can be used to compile and evaluate Water Quality Portal (WQP) data for samples collected from surface water monitoring sites on streams and lakes. It can be used to create applications that support water quality programs and help states, tribes, and other stakeholders efficiently analyze the data.
https://usepa.github.io/EPATADA/
Creative Commons Zero v1.0 Universal
38 stars 17 forks source link

Advanced geospatial WQP data retrieval via user supplied shapefile or ESRI WFS #361

Open cristinamullin opened 7 months ago

cristinamullin commented 7 months ago

Is your feature request related to a problem? Please describe.

Summary: Generate query via a user supplied shapefile input, csv with monitoring location IDs (dependent on addressing big DR and max URL issues first: https://github.com/USEPA/TADA/issues/363), or by using an ESRI WFS such as the EPA tribal boundary service. Potentially leverage DR/NLDI/nhdplus tools… or EPA WATERS which has a copy of the WQP sites to get the site IDs within the shapefile boundary and then query the WQP by those specific site IDs.

More info: Advanced WQP data retrieval: generate query via a user supplied shapefile input, csv with monitoring location IDs, or by using an ESRI WFS such as the EPA tribal boundary service. Potentially leverage EPA WATERS which has a copy of the WQP sites to get the site IDs within the shapefile boundary and then query the WQP by those specific site IDs.

Example request from MT DEQ: For specific projects, we often have an area of interest where we want all of the sites from that area. Can the package or shiny app select all the data in a particular area by uploading a shapefile? Or alternatively, could you load up activityid’s for your area of interest, have it load the data for all of those activityid’s from the wq portal, and complete the shiny steps for those?

Just thinking of a particular project of a watershed above a mining area, where I have to do QA/QC, and I would like to run the data for those activityid’s through the shiny steps, but they don’t conform to a particular HUC or assessment unit. I wanted to see what type of result I get with the shiny app (compared to what I ended up doing in excel).

Describe the solution you'd like

Supply a shapefile as an input for the data retrieval function and get back all sites within the selected areas.

New features should include all of the following work:

jbousquin commented 5 months ago

This (query by bounding box extent) is typically how I'm using harmonize-wq. wrangle.get_bounding_box() just pulls the extent off the shp in dataretrievel-python package's wqp.get_results() expected bBox order/format: 'xmin,ymin,xmax,ymax'

It looks like dataRetrieval::whatWQPdata will take bBox as input WQPdots. If building a function to help a user I would take an sf object as input rather than shp, service, etc., since there are lots of user inputs/errors sf::st_read will hopefully already deal with. From what I can tell sf::st_sf() already returns the right order, may just need to check CRS and reformat as string.

After results/stations are retrieved you can then clip them by the original sf (st_intersection). The extra results in the extent but not in the area of interest doesn't tend to slow down the query.