With the changes in this PR, the same function returns no bad identifiers because the options above are now accepted by the WQP web service.
> identify_bad_ids(bad_sites_test)
# A tibble: 0 x 1
# ... with 1 variable: site_id <chr>
I also tried this out using the following inputs in _targets.R:
# Specify coordinates that define the spatial area of interest
# lat/lon are referenced to WGS84
coords_lon <- c(-96.333, -87.8, -89)
coords_lat <- c(42.547, 45.029, 35)
# Specify arguments to WQP queries
# see https://www.waterqualitydata.us/webservices_documentation for more information
wqp_args <- list(sampleMedia = c("Water","water"),
siteType = "Lake, Reservoir, Impoundment",
# return sites with at least one data record
minresults = 1,
startDateLo = start_date,
startDateHi = end_date)
Before incorporating these changes, the resulting query contains 12 sites with "/" in the site identifiers, which the pipeline informed us of:
> tar_make(p2_site_counts_grouped)
Linking to GEOS 3.9.1, GDAL 3.2.1, PROJ 7.2.1; sf_use_s2() is TRUE
v skip target p1_global_grid
v skip target p1_wqp_params_yml
* start target p1_AOI
* built target p1_AOI
v skip target p1_wqp_params
* start target p1_AOI_sf
* built target p1_AOI_sf
v skip target p1_char_names_crosswalk
* start target p1_global_grid_aoi
* built target p1_global_grid_aoi
v skip target p1_char_names
...
* start target p1_wqp_inventory_aoi
Attempting to harmonize different site CRS...
Returned 4982 sites within area of interest.
* built target p1_wqp_inventory_aoi
* start target p2_site_counts
* built target p2_site_counts
* start target p2_site_counts_grouped
Some site identifiers contain undesired characters and cannot be parsed by WQP. Assigning 12 sites and 364 records with bad identifiers to their own download groups so that they can be queried separately using a different method.
* built target p2_site_counts_grouped
* end pipeline: 14.449 minutes
After incorporating the changes in this PR and re-running tar_make(p2_site_counts_grouped), no sites are flagged as "bad" and so we could pull all data by site id rather than the back-up bounding box approach 🎉
This PR updates the regex used to find bad site identifiers to reflect recent changes to the WQP web service.
Using the previous function
identify_bad_ids()
we would have found the following "bad" identifiers in the test-set below:With the changes in this PR, the same function returns no bad identifiers because the options above are now accepted by the WQP web service.
I also tried this out using the following inputs in
_targets.R
:Before incorporating these changes, the resulting query contains 12 sites with "/" in the site identifiers, which the pipeline informed us of:
After incorporating the changes in this PR and re-running
tar_make(p2_site_counts_grouped)
, no sites are flagged as "bad" and so we could pull all data by site id rather than the back-up bounding box approach 🎉Nice work getting this change incorporated into the web service, @jordansread!
Closes #89