dieghernan / tidyterra

tidyverse and ggplot2 methods for terra spatial objects
https://dieghernan.github.io/tidyterra/
Other
172 stars 7 forks source link

Feedback needed #1

Open dieghernan opened 2 years ago

dieghernan commented 2 years ago

{tidyterra} tries to improve the data wrangling and visualisation of Spat* objects ({terra}) by providing new tidyverse methods for that objects.

The result is a package with wrappers around terra functions but using tidyverse API.

Hopefully this would help useRs with no experience on spatial rasters to start working with this kind of formats. For experienced useRs tidyterra may not suit their needs, specially in terms of performance with big raster files.

The developer of the package (myself) is one of that not-so-into-raster users, so I may have taken bad decisions on the development of this package. For that reason, all the feedback you can provide is speciall useful.

Some users I can think of are obviously @rhijmans, @nowosad, @dominicroye, @paleolimbot, @milos-agathon @barryrowlingson.

So far, feedback is needed on

dieghernan commented 2 years ago

On the tidyverse / ggplot2 side it would be great to have some insight from @hadley @romainfrancois @lionel- @wch @thomasp85 to cite just a few

Nowosad commented 2 years ago

Hi @dieghernan -- I check the package and it looks interesting. Currently, I do not have any suggestions, except maybe:

  1. Do you plan to add group_by() and summarize() as well? This could work, for example, like zonal statistics.
rhijmans commented 2 years ago

I do not know tidyverse, but I suppose this package could be especially useful for SpatVector as these can be represented as a data.frame (tibble) that includes their geometry (as.data.frame(v, geom="WKT") or by coercing to sf and back).

I also suppose that anything to facilitate the use of ggplot would be very helpful to many

There may be some methods that are useful for SpatRaster as well, such as for selecting layers, but I am not convinced that implementing methods like drop_na are very useful. Adding (near) synonyms might confuse as much as help, and as it would cover a very small part of the interface, one would still need to learn the terra idiom anyway. Otherwise, your idea for how to implement it seems reasonable, and, again, I do not know tidyverse, so my opinion on utility is not that meaningful.

dieghernan commented 2 years ago

Thanks for your feedback @Nowosad @rhijmans , much appreciate.

Regarding your suggestion, let me expand a bit. My idea of the package is basically to extend some common used tidyverse methods for data wrangling. The main goal for me is to help useRs with no spatial background to get started with rasters, at least with basic transformations and plotting.

I am leveraging on the idea already implemented on sf/stars for doing so.

sf: https://github.com/r-spatial/sf/blob/HEAD/R/tidyverse.R stars: https://github.com/r-spatial/stars/blob/main/R/tidyverse.R

So I would prefer not to implement spatial operations on the package (for example, left_join.sf would work as appending a data frame to an sf, but for spatial join a specific st_join call is needed). It may be an exception to this with group_by.sf, since it would merge geometries, but this is only implemented on vectors (sf), not in rasters (stars).

So I was not planning on implementing group_by.SpatRaster/summarize.SpatRaster(). This is also connected with @rhijmans comment:

Adding (near) synonyms might confuse as much as help, and as it would cover a very small part of the interface, one would still need to learn the terra idiom anyway.

I would completely encourage useRs to learn {terra} idiom, as it is much more efficient than the wrappers that I could provide here. The conversion (actually implemented on {tidyterra}) data.frame > operation > back to SpatRaster is not efficient for large SpatRasters. I made some effort on that trying to add a section on the docs with the {terra} equivalent on the functions (example: https://dieghernan.github.io/tidyterra/reference/pull.html#terra-equivalent). Also I refer to this on the README: https://github.com/dieghernan/tidyterra#exclamation-a-note-on-performance

Again, {tidyterra} might be useful for beginners and/or medium size SpatRasters, but hopefully this would help to spread the usage of {terra} since it would reduce the barriers to entry. IMHO the {terra} package has an easy interface if you have worked previously with {raster}, but thinking on a completely novice with no background on rasters it can become a bit hard.

Also @rhijmans, I was thinking on removing drop_na() at least for SpatRasters, the implementation may not be really useful. So agree on that, thanks

For conversions SpatVector/sf I would simply advice sf::st_as_sf(SpatVector)/terra::vect(sf.object), this is more straightforward and CRS information won't be lost on the conversion (that I think it would happen with as.data.frame(v, geom="WKT")

Let me share with you the tidyverse methods I have identified on sf/stars and the degree of implementation on tidyterra. I am putting the focus on SpatRasters so far (SpatVectors are based on sf methods, not hard to implement really but I still didn't complete it):

✔️: Implemented 🟢: Not explicitely implemented but working on sf 🟡: To be implemented on tidyterra

package verb stars SpatRaster sf SpatVector
tibble as_tibble ✔️ ✔️ 🟢 ✔️
dplyr anti_join ✔️
dplyr arrange ✔️
dplyr distinct ✔️
dplyr filter ✔️ ✔️ ✔️ ✔️
dplyr full_join ✔️
dplyr group_by ✔️
dplyr group_split ✔️
dplyr inner_join ✔️
dplyr left_join ✔️
dplyr mutate ✔️ ✔️ ✔️ ✔️
dplyr pull ✔️ ✔️ 🟢 ✔️
dplyr relocate ✔️ 🟢 ✔️
dplyr rename ✔️ ✔️ ✔️ ✔️
dplyr right_join ✔️
dplyr rowwise ✔️
dplyr sample_frac ✔️
dplyr sample_n ✔️
dplyr select ✔️ ✔️ ✔️ ✔️
dplyr semi_join ✔️
dplyr slice ✔️ ✔️ ✔️ ✔️
dplyr summarise ✔️
dplyr transmute ✔️ ✔️ ✔️ ✔️
dplyr ungroup ✔️
tidyr drop_na ✔️ 🟢 ✔️
tidyr gather ✔️
tidyr nest ✔️
tidyr pivot_longer ✔️
tidyr pivot_wider ✔️
tidyr replace_na ✔️ ✔️ 🟢 ✔️
tidyr separate ✔️
tidyr separate_rows ✔️
tidyr spread ✔️
tidyr unite ✔️
tidyr unnest ✔️
dieghernan commented 1 year ago

@Nowosad @rhijmans

  1. Do you plan to add group_by() and summarize() as well? This could work, for example, like zonal statistics.

and

I do not know tidyverse, but I suppose this package could be especially useful for SpatVector as these can be represented as a data.frame (tibble) that includes their geometry (as.data.frame(v, geom="WKT") or by coercing to sf and back).

Next release of tidyterra would support group_by/summarize for SpatVectors based on the as.data.frame(v, geom="WKT"). Also, more dplyr methods for SpatVectors (arrange, distinct, bind_row/col, left_join/inner_join) would be added (see #84). Instead on relying on sf conversion I created my own process based on the as.data.frame approach.

A quick example:

library(terra)
#> terra 1.7.18
library(tidyterra)
#> 
#> Attaching package: 'tidyterra'
#> The following object is masked from 'package:stats':
#> 
#>     filter
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:terra':
#> 
#>     intersect, union
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

v_lux <- vect(system.file("ex/lux.shp", package = "terra"))

v_lux %>%
  mutate(gr = cut(POP / 1000, 5)) %>%
  group_by(gr) %>%
  summarise(n = n(), tot_pop = sum(POP), mean_area = mean(AREA)) %>%
  arrange(desc(gr)) %>%
  glimpse() %>%
  autoplot(aes(fill = gr)) +
  ggplot2::ggtitle("Dissolved")
#> Rows: 3
#> Columns: 4
#> $ gr        <fct> "(147,183]", "(40.7,76.1]", "(4.99,40.7]"
#> $ n         <int> 2, 1, 9
#> $ tot_pop   <int> 359427, 48187, 194391
#> $ mean_area <dbl> 244.0000, 185.0000, 209.7778


# We can control the aggregation on summarise with .dissolve
v_lux %>%
  mutate(gr = cut(POP / 1000, 5)) %>%
  group_by(gr) %>%
  # Here, not dissolving
  summarise(
    n = n(), tot_pop = sum(POP), mean_area = mean(AREA),
    .dissolve = FALSE
  ) %>%
  arrange(desc(gr)) %>%
  # Same statistics
  glimpse() %>%
  # But not dissolving aggregated polygons
  autoplot(aes(fill = gr)) +
  ggplot2::ggtitle("Not Dissolved")
#> Rows: 3
#> Columns: 4
#> $ gr        <fct> "(147,183]", "(40.7,76.1]", "(4.99,40.7]"
#> $ n         <int> 2, 1, 9
#> $ tot_pop   <int> 359427, 48187, 194391
#> $ mean_area <dbl> 244.0000, 185.0000, 209.7778

Created on 2023-03-11 with reprex v2.0.2

rhijmans commented 1 year ago

That is very cool.

I wonder if you there are cases where you can avoid coercing the geometries to WKT or sf. That could save a lot of time. For example, you currently have.

select.SpatVector <- function(.data, ...) {
  # Use sf method
  sf_obj <- sf::st_as_sf(.data)
  selected <- dplyr::select(sf_obj, ...)
  return(terra::vect(selected))
}

But that can be done much more efficiently with

select.SpatVector <- function(.data, ...) {
    d <- data.frame(rbind(1:ncol(.data)))
    names(d) <- names(.data)
    selected <- dplyr::select(d, ...)
    columns <- unlist(selected[1,])
    .data[,columns]
}

Likewise,

rename.SpatVector <- function(.data, ...) {
  # Use sf
  sfobj <- dplyr::rename(sf::st_as_sf(.data), ...)
  end <- terra::vect(sfobj)
  return(end)
}

Could be

rename.SpatVector <- function(.data, ...) {
  # Use data.frame
    d <- data.frame(matrix(ncol=ncol(.data), nrow=0))
    names(d) <- names(.data)
    d <- dplyr::rename(d, ...)
    names(.data) <- names(d)
    .data
}

I see that you already have something similar for rename_with.

I could have a look at row-wise operations as well if you are interested.

dieghernan commented 1 year ago

Thanks @rhijmans I still have to migrate some functions, including those that you mentioned. My overall approach is to avoid as much as possible coercion between classes, so your code is exactly what I needed.

Row-wise? I didn't explore it so far, don't need to spend time on that yet, but obviously if you feel in the mood and finally you have a look please let me know.

dieghernan commented 1 year ago

Rowwise implemented in #92 😁

twest820 commented 1 year ago

Hi, would being able to ggsave() GeoTIFFs be a feature that's in scope for tidyterra? It's currently possible by saving whatever spatial plot you have as a normal image, reading that file back into R as a raster, setting extents and CRS on the raster, and then writing to disk again to put the raster down as a GeoTIFF. However, my experience is this is a fragile process as it's easy for axis labels, legends, ggsave() arguments, coord_sf(), and other things to cause extents to get confused. The result's easily that you can write 160+ MB to disk several times before you get a .tif which actually has the extents that were set on it.

Having a GeoTIFF device which automates this process seems handy for tasks centering around annotated map production. There's several questions around this on StackOverflow from folks needing to put grids or CRS ticks and such on rasters. I'm using this approach as a way of logging what spatial processing coded in R is doing in a way that's easily inspected in close detail in GIS (for example, an algorithm's output is good 98% of the time but you need to be able to scan though 40 ha at 0.5 m resolution to find the 2% cases where the code wants improvement).

stantis commented 1 year ago

Will there be a way to utilize sbar() and north() from terra within the tidyterra language?

dieghernan commented 1 year ago

Will there be a way to utilize sbar() and north() from terra within the tidyterra language?

Hi @stantis , that’s not possible AFAIK since sbar() and north() are meant to be used on base plots, while ggplot2 uses another plotting mechanism.

If you want to plot north arrows and geographic scale bars in ggplot2 you may want to use ggspatial funs (https://paleolimbot.github.io/ggspatial/reference/annotation_north_arrow.html https://paleolimbot.github.io/ggspatial/reference/annotation_scale.html) or switch completely to tmap, that has a great support for these two graphical objects (https://r-tmap.github.io/tmap/reference/tm_compass.html https://r-tmap.github.io/tmap/reference/tm_scale_bar.html).

HRodenhizer commented 3 months ago

Maybe this is better as a separate feature request - happy to move it if you think so.

First off, I just want to say that I really appreciate this package! I had developed my own clunky, extraordinarily slow way to plot RGB imagery using ggplot2, and this package has massively improved my experience of quickly throwing together a decent looking map.

Two things I would love to see support for are stretching imagery in geom_spatraster_rgb() (unless I'm just missing this and it is available already) and the ability to set max_col_value (or perhaps even min and max values?) on a per-band basis.