USEPA / spsurvey

spsurvey: Spatial Sampling Design and Analysis in R
https://usepa.github.io/spsurvey/
GNU General Public License v3.0
15 stars 5 forks source link

NaN coordinates in sites_base and sites_over when using legacy_sites with grts() #40

Closed nstauffer closed 8 months ago

nstauffer commented 8 months ago

I've been trying to use the legacy sites argument to balance a set of points around another set of points, e.g., taking an existing sampling design in an area as legacy points and drawing a new design including all those legacy points plus new points then keeping only the new points as a supplemental design. But! I've run into an issue where using legacy_sites results in the output looking normal, but the lat_WGS84 and lon_WGS84 variables in output$sites_base and output$sites_over are all NaN.

There are no outright errors reported, but I get the following warnings which appear to have to do with identifying the extent of the bounding box for one of the spatial objects.

Warning messages:
1: In min(bb[, 1L], na.rm = TRUE) :
  no non-missing arguments to min; returning Inf
2: In min(bb[, 2L], na.rm = TRUE) :
  no non-missing arguments to min; returning Inf
3: In max(bb[, 3L], na.rm = TRUE) :
  no non-missing arguments to max; returning -Inf
4: In max(bb[, 4L], na.rm = TRUE) :
  no non-missing arguments to max; returning -Inf
5: In min(bb[, 1L], na.rm = TRUE) :
  no non-missing arguments to min; returning Inf
6: In min(bb[, 2L], na.rm = TRUE) :
  no non-missing arguments to min; returning Inf
7: In max(bb[, 3L], na.rm = TRUE) :
  no non-missing arguments to max; returning -Inf
8: In max(bb[, 4L], na.rm = TRUE) :
  no non-missing arguments to max; returning -Inf

This happens regardless of if I'm stratifying or not and I've done everything I can to make sure there are no geometry errors or irregularities in the polygons with both sf::st_buffer() and sf::st_make_valid(). I also get numeric values as expected from both my frame and legacy sites when using sf::st_bbox(). And I can get it to produce valid points as long as I don't use legacy_sites. I've gone rooting around to see if I can identify where things are going sideways within grts() but haven't been able to turn up a starting point.

The attached file contains a geodatabase with the polygons I've been using. Below should produce the warnings and the wonky, coordinate-less outputs.

data_path <- "wherever/id_frfo_lup_2024_data.gdb"

strata <- sf::st_read(dsn = data_path,
                      layer = "strata")

strata <- sf::st_transform(x = strata,
                           crs = "+proj=aea +lat_1=29.5 +lat_2=45.5 +lat_0=37.5 +lon_0=-96 +x_0=0 +y_0=0 +ellps=GRS80 +datum=NAD83 +units=m +no_defs")

# Draw the first design
set.seed(420)
revisit_points <- spsurvey::grts(sframe = strata,
                                 stratum_var = "stratum",
                                 n_base = c(Foothills = 65,
                                            Plains = 60,
                                            Mountains = 10),
                                 n_over = c(Foothills = 26,
                                            Plains = 24,
                                            Mountains = 4),
                                 DesignID = "FRFO2024")

revisit_points <- dplyr::bind_rows(revisit_points$sites_base,
                                   revisit_points$sites_over)

# Confirm that all these points have valid coordinates
all(!is.nan(revisit_points$lon_WGS84))
all(!is.nan(revisit_points$lat_WGS84))

# Draw the second design "around" the first by drawing all the "legacy" sites
# from the first design but keeping only sites_base and sites_over
set.seed(420)
nonrevisit_points <- spsurvey::grts(sframe = strata,
                                    stratum_var = "stratum",
                                    n_base = c(Foothills = 80,
                                               Mountains = 20,
                                               Plains = 60) + table(revisit_points$stratum),
                                    n_over = c(Foothills = 32,
                                               Mountains = 8,
                                               Plains = 24),
                                    legacy_sites = dplyr::select(revisit_points,
                                                                 siteID,
                                                                 stratum),
                                    DesignID = "FRFO2024")

nonrevisit_points <- dplyr::bind_rows(nonrevisit_points$sites_base,
                                      nonrevisit_points$sites_over)

# Confirm that none of these points have valid coordinates
any(!is.nan(nonrevisit_points$lon_WGS84))
any(!is.nan(nonrevisit_points$lat_WGS84))
michaeldumelle commented 8 months ago

Thanks @nstauffer . I believe the bug occurs here because the geometry column name is not geometry, so a quick fix for you is to run

sf::st_geometry(strata) <- "geometry" 

See here for more.

We do plan to fix this bug in the next CRAN submission (and allow for arbitrary geometry names), but I will leave the issue open until we push a fix.

nstauffer commented 8 months ago

That worked like a charm, thanks! I try to be vigilant for that particular issue because it often gives me heartburn trying to combine sf objects, but it didn't cross my mind this time.

michaeldumelle commented 8 months ago

@nstauffer one other note: see spsurvey::sp_rbind() for combining sites objects from grts() output.

michaeldumelle commented 8 months ago

@nstauffer We have implemented a more permanent solution, which is available in the development version of spsurvey you can download by running

devtools::install_github("USEPA/spsurvey", ref = "develop")

This bug fix will be included in the next CRAN submission (the current version of CRAN is 5.5.0).

michaeldumelle commented 8 months ago

@nstauffer The fix is now on CRAN.