Closed edzer closed 2 years ago
Hi @edzer. I am looking into this today and will try to resolve as soon as I figure it out. However, wt
is a column, not a matrix, because nb
is rowwise, so unfortunately that was hiding the actual problem.
The issue appears to be coming somewhere between our redist_map
and your sf
dplyr_reconstruct
calls, where the sf
class is being dropped. I'm investigating where it gets dropped. Our redist_map
class builds off of the sf
tibble, adding an adjacency column and some bookkeeping.
@huizezhang-sherry could you pls help with this one? The package subclasses sf
with a redist_map
class, and it seems that in calls to mutate
on such objects the redist_map
class is restored, but the sf
gets dropped. Does this package need to implement dplyr_row_slice
and dplyr_col_modify
methods, or is the problem on the sf
side?
Hi @edzer and @huizezhang-sherry. I've been looking into this and it seems to me that the issue may be on the side of sf
. I am not 100% certain though. It seems that sf
, with the new dplyr_row_slice()
, drops the rowwise_df
class.
library(sf)
#> Linking to GEOS 3.9.1, GDAL 3.4.3, PROJ 7.2.1; sf_use_s2() is TRUE
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
read_sf(system.file("shape/nc.shp", package="sf"), quiet = TRUE) %>%
st_transform(3857) -> nc
class(nc)
#> [1] "sf" "tbl_df" "tbl" "data.frame"
nc <- nc %>% rowwise()
class(nc)
#> [1] "sf" "rowwise_df" "tbl_df" "tbl" "data.frame"
nc %>% slice(1) %>% class()
#> [1] "sf" "tbl_df" "tbl" "data.frame"
nc %>% filter(AREA > .1) %>% class()
#> [1] "sf" "tbl_df" "tbl" "data.frame"
Created on 2022-07-07 by the reprex package (v2.0.1)
Being more precise in the dplyr_row_slice()
(and dplyr_col_modify()
) generics seems to fix that. See this commit to a fork of sf
. I'm still tracking down if this is a correct "fix" or just happens to resolve the issue.
Hi @edzer and @christopherkenny. The rowwise issue is with sf
and adding a line class(data) <- setdiff(class(template), "sf")
in dplyr_reconstruct.sf()
before st_as_sf()
will fix the issue. I prefer to add this line in dplyr_reconstrct.sf()
since it is about reconstructing the data as per the template.
The class structure in the rowwise example can be generalised as : "sf", "myclass", ....
. If the new object has the class structure as "myclass", "sf", ...
, then they need to implement their own dplyr_row_slice.myclass()
, dplyr_col_modify.myclass()
, and dplyr_reconstruct.myclass()
.
@huizezhang-sherry do you suggest to replace
https://github.com/r-spatial/sf/blob/9d3bcf864f77f651281e23cde6747d440fb54242/R/tidyverse.R#L10-L11
with
class(data) = setdiff(class(template), "sf")
?
I believe you would need something more akin to:
if (inherits(template, "tbl_df"))
data = dplyr::as_tibble(data)
if (inherits(template, "grouped_df"))
class(data) <- c("grouped_df", class(data))
if (inherits(template, "rowwise_df"))
class(data) <- c("rowwise_df", class(data))
If you only do class(data) = setdiff(class(template), "sf")
, then I see the final class as c("sf", "data.frame")
, as there is no place where you bring back the "tbl_df" and "tbl" classes when your reconstruct the object.
However, I do think the dplyr_col_modify.sf
and dplyr_row_slice.sf
methods should be returning sf
objects, by passing back to st_as_sf
(or a different way to create the sf
class, if there is a more appropriate method) before returning. I have yet to see a full description anywhere of the best practices for these methods, as they are still experimental, but here are some examples:
grouped_df
row slice returns a grouped_df
rowwise_df
row slice returns a rowwise_df
tbl_ts
row slice returns a tbl_ts
via the update_meta
functionIf so, then the body for both would look like this:
x <- NextMethod()
sf_column_name = attr(data, "sf_column")
class(data) <- setdiff(class(data), "sf")
st_as_sf(dplyr::dplyr_reconstruct(x, data), sf_column_name = sf_column_name)
If it's helpful to PR in something to make these changes to review, just let me know. If not, I'd suggest at least adding two tests for rowwise_df/grouped_df, like so.
Ah, we've been trying to solve this in parallel. Please have a look at https://github.com/r-spatial/sf/commit/2df4b5cd859d0e5f791284474fd3ec834e46682f.
Hi @edzer. Yeah, your solution makes sense to me and fixes the issue that we were catching downstream in redist
by my check. Thanks for investigating.
To avoid problems down the line as dplyr
methods evolve, it might be worth adding basic tests to test_tidy
that the tibble subclasses and sf superclasses are maintained:
test_that("rowwise_df class is retained on row slice", {
skip_if_not_installed("dplyr")
expect_true(nc %>% rowwise() %>% slice(1) %>% inherits("rowwise_df"))
})
test_that("grouped_df class is retained on row slice", {
skip_if_not_installed("dplyr")
expect_true(nc %>% group_by(PERIMETER > 2) %>% slice(1) %>% inherits("grouped_df"))
})
Thanks again!
Great idea, thanks!
Hi,
Checking redist against the github version of sf revealed this:
These are due to improvements in https://github.com/r-spatial/sf/issues/1958 , which are only now revealed because you counted on wrong behaviour of
sf
. Inredist.plot.wted.adj()
you have the linenb %>% mutate(wt = cooc[.data$i, .data$j])
which assumeswt
is a column (as you pass it later on as the color), but it is of course a matrix. With newsf
it will be a matrix in the newnb
object, so you may want to saywt = cooc[.data$i, .data$j][,1]
if you want to specify the first column (which I assume was placed innb
originally, and wrongly).I didn't check whether this would solve all your problems, but assume that all would be caused by this change. Let me know when you can address this, so I can give green light to publish the revision of
sf
on CRAN.