alarm-redist / redist

Simulation methods for legislative redistricting.
https://alarm-redist.github.io/redist/
GNU General Public License v2.0
66 stars 23 forks source link

revdep check on upcoming sf breaks redist #148

Closed edzer closed 2 years ago

edzer commented 2 years ago

Hi,

Checking redist against the github version of sf revealed this:

Package: redist
Check: examples
New result: ERROR
  Running examples in ‘redist-Ex.R’ failed
  The error most likely occurred in:

  > base::assign(".ptime", proc.time(), pos = "CheckExEnv")
  > ### Name: redist.plot.wted.adj
  > ### Title: Plot Weighted Border Adjacency
  > ### Aliases: redist.plot.wted.adj
  > 
  > ### ** Examples
  > 
  > data(iowa)
  > shp <- redist_map(iowa, existing_plan = cd_2010, pop_tol = 0.01)
  > plans <- redist_smc(shp, 100)
  SEQUENTIAL MONTE CARLO
  Sampling 100 99-unit maps with 4 districts and population between 753,973 and 769,205.
  Split [0/3] ■                                | ETA?
  Split [3/3] ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■  | ETA 0s

  > redist.plot.wted.adj(shp, plans = plans, counties = region)
  Warning: Use of `nb$wt` is discouraged. Use `wt` instead.
  Error in `check_aesthetics()`:
  ! Aesthetics must be either length 1 or the same as the data (47): colour
  Backtrace:
       ▆
    1. ├─base (local) `<fn>`(x)
    2. └─ggplot2:::print.ggplot(x)
    3.   ├─ggplot2::ggplot_build(x)
    4.   └─ggplot2:::ggplot_build.ggplot(x)
    5.     └─ggplot2 (local) by_layer(function(l, d) l$compute_aesthetics(d, plot))
    6.       └─ggplot2 (local) f(l = layers[[i]], d = data[[i]])
    7.         └─l$compute_aesthetics(d, plot)
    8.           └─ggplot2 (local) f(..., self = self)
    9.             └─ggplot2:::check_aesthetics(evaled, n)
   10.               └─rlang::abort(...)
  Execution halted

Package: redist
Check: re-building of vignette outputs
New result: ERROR
  Error(s) in re-building vignettes:
    ...
  --- re-building ‘common_args.Rmd’ using rmarkdown
  Loading required package: redistmetrics

  Attaching package: 'redist'

  The following object is masked from 'package:stats':

      filter

  `pop_tol` calculated from existing plan is ≤ 0.1%
  --- finished re-building ‘common_args.Rmd’

  --- re-building ‘map-preproc.Rmd’ using rmarkdown

  Attaching package: 'dplyr'

  The following objects are masked from 'package:stats':

      filter, lag

  The following objects are masked from 'package:base':

      intersect, setdiff, setequal, union

  trying URL ‘[https://github.com/alarm-redist/redist-data/raw/main/data/king_county.rds’](https://github.com/alarm-redist/redist-data/raw/main/data/king_county.rds%E2%80%99)
  Content type 'application/octet-stream' length 200100 bytes (195 KB)
  ==================================================
  downloaded 195 KB

  Quitting from lines 234-244 (map-preproc.Rmd) 
  Error: processing vignette ‘map-preproc.Rmd’ failed with diagnostics:
  Plotting requires a shapefile.
  → If you've just used `merge_by()`, consider passing `drop_geom = FALSE`.
  --- failed re-building ‘map-preproc.Rmd’

  --- re-building ‘mcmc.Rmd’ using rmarkdown
  --- finished re-building ‘mcmc.Rmd’

  --- re-building ‘mpi-slurm.Rmd’ using rmarkdown
  --- finished re-building ‘mpi-slurm.Rmd’

  --- re-building ‘redist.Rmd’ using rmarkdown
  starting worker pid=2991561 on localhost:11381 at 13:41:21.900
  starting worker pid=2991560 on localhost:11381 at 13:41:21.960
  Loading required package: redist
  Loading required package: redistmetrics

  Attaching package: ‘redist’

  The following object is masked from ‘package:stats’:

      filter

  loaded redist and set parent environment
  Loading required package: foreach
  Loading required package: rngtools
  Loading required package: redist
  Loading required package: redistmetrics

  Attaching package: ‘redist’

  The following object is masked from ‘package:stats’:

      filter

  loaded redist and set parent environment
  Loading required package: foreach
  Loading required package: rngtools
  SEQUENTIAL MONTE CARLO
  Sampling 500 99-unit maps with 4 districts and population between 753,973 and 769,205.
  Split [0/3] ■                                | ETA?
  Split [3/3] ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■  | ETA 0s

  Linking to GEOS 3.10.3, GDAL 3.5.0, PROJ 9.0.1; sf_use_s2() is TRUE
  A <redist_plans> containing 1000 sampled plans and 1 reference plan
  Plans have 4 districts from a 99-unit map, and were drawn using Sequential
  Monte Carlo.
  --- finished re-building ‘redist.Rmd’

  SUMMARY: processing the following file failed:
    ‘map-preproc.Rmd’

  Error: Vignette re-building failed.
  Execution halted

These are due to improvements in https://github.com/r-spatial/sf/issues/1958 , which are only now revealed because you counted on wrong behaviour of sf. In redist.plot.wted.adj() you have the line nb %>% mutate(wt = cooc[.data$i, .data$j]) which assumes wt is a column (as you pass it later on as the color), but it is of course a matrix. With new sf it will be a matrix in the new nb object, so you may want to say wt = cooc[.data$i, .data$j][,1] if you want to specify the first column (which I assume was placed in nb originally, and wrongly).

I didn't check whether this would solve all your problems, but assume that all would be caused by this change. Let me know when you can address this, so I can give green light to publish the revision of sf on CRAN.

christopherkenny commented 2 years ago

Hi @edzer. I am looking into this today and will try to resolve as soon as I figure it out. However, wt is a column, not a matrix, because nb is rowwise, so unfortunately that was hiding the actual problem.

The issue appears to be coming somewhere between our redist_map and your sf dplyr_reconstruct calls, where the sf class is being dropped. I'm investigating where it gets dropped. Our redist_map class builds off of the sf tibble, adding an adjacency column and some bookkeeping.

edzer commented 2 years ago

@huizezhang-sherry could you pls help with this one? The package subclasses sf with a redist_map class, and it seems that in calls to mutate on such objects the redist_map class is restored, but the sf gets dropped. Does this package need to implement dplyr_row_slice and dplyr_col_modify methods, or is the problem on the sf side?

christopherkenny commented 2 years ago

Hi @edzer and @huizezhang-sherry. I've been looking into this and it seems to me that the issue may be on the side of sf. I am not 100% certain though. It seems that sf, with the new dplyr_row_slice(), drops the rowwise_df class.

library(sf)
#> Linking to GEOS 3.9.1, GDAL 3.4.3, PROJ 7.2.1; sf_use_s2() is TRUE
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
read_sf(system.file("shape/nc.shp", package="sf"), quiet = TRUE) %>%
    st_transform(3857) -> nc

class(nc)
#> [1] "sf"         "tbl_df"     "tbl"        "data.frame"

nc <- nc %>% rowwise()
class(nc)
#> [1] "sf"         "rowwise_df" "tbl_df"     "tbl"        "data.frame"

nc %>% slice(1) %>% class()
#> [1] "sf"         "tbl_df"     "tbl"        "data.frame"

nc %>% filter(AREA > .1) %>% class()
#> [1] "sf"         "tbl_df"     "tbl"        "data.frame"

Created on 2022-07-07 by the reprex package (v2.0.1)

Being more precise in the dplyr_row_slice() (and dplyr_col_modify()) generics seems to fix that. See this commit to a fork of sf. I'm still tracking down if this is a correct "fix" or just happens to resolve the issue.

huizezhang-sherry commented 2 years ago

Hi @edzer and @christopherkenny. The rowwise issue is with sf and adding a line class(data) <- setdiff(class(template), "sf") in dplyr_reconstruct.sf() before st_as_sf() will fix the issue. I prefer to add this line in dplyr_reconstrct.sf() since it is about reconstructing the data as per the template.

The class structure in the rowwise example can be generalised as : "sf", "myclass", ..... If the new object has the class structure as "myclass", "sf", ..., then they need to implement their own dplyr_row_slice.myclass(), dplyr_col_modify.myclass(), and dplyr_reconstruct.myclass().

edzer commented 2 years ago

@huizezhang-sherry do you suggest to replace

https://github.com/r-spatial/sf/blob/9d3bcf864f77f651281e23cde6747d440fb54242/R/tidyverse.R#L10-L11

with

class(data) = setdiff(class(template), "sf")

?

christopherkenny commented 2 years ago

I believe you would need something more akin to:

    if (inherits(template, "tbl_df"))
        data = dplyr::as_tibble(data)
    if (inherits(template, "grouped_df"))
        class(data) <- c("grouped_df", class(data))
    if (inherits(template, "rowwise_df"))
        class(data) <- c("rowwise_df", class(data))

see in fork

If you only do class(data) = setdiff(class(template), "sf") , then I see the final class as c("sf", "data.frame"), as there is no place where you bring back the "tbl_df" and "tbl" classes when your reconstruct the object.

However, I do think the dplyr_col_modify.sf and dplyr_row_slice.sf methods should be returning sf objects, by passing back to st_as_sf (or a different way to create the sf class, if there is a more appropriate method) before returning. I have yet to see a full description anywhere of the best practices for these methods, as they are still experimental, but here are some examples:

If so, then the body for both would look like this:

    x <- NextMethod()
    sf_column_name = attr(data, "sf_column")
    class(data) <- setdiff(class(data), "sf")
    st_as_sf(dplyr::dplyr_reconstruct(x, data), sf_column_name = sf_column_name)

If it's helpful to PR in something to make these changes to review, just let me know. If not, I'd suggest at least adding two tests for rowwise_df/grouped_df, like so.

edzer commented 2 years ago

Ah, we've been trying to solve this in parallel. Please have a look at https://github.com/r-spatial/sf/commit/2df4b5cd859d0e5f791284474fd3ec834e46682f.

christopherkenny commented 2 years ago

Hi @edzer. Yeah, your solution makes sense to me and fixes the issue that we were catching downstream in redist by my check. Thanks for investigating.

To avoid problems down the line as dplyr methods evolve, it might be worth adding basic tests to test_tidy that the tibble subclasses and sf superclasses are maintained:

test_that("rowwise_df class is retained on row slice", {
    skip_if_not_installed("dplyr")
    expect_true(nc %>% rowwise() %>% slice(1) %>% inherits("rowwise_df"))
})

test_that("grouped_df class is retained on row slice", {
    skip_if_not_installed("dplyr")
    expect_true(nc %>% group_by(PERIMETER > 2) %>% slice(1) %>% inherits("grouped_df"))
})

Thanks again!

edzer commented 2 years ago

Great idea, thanks!