Open dieghernan opened 2 years ago
On the tidyverse / ggplot2 side it would be great to have some insight from @hadley @romainfrancois @lionel- @wch @thomasp85 to cite just a few
Hi @dieghernan -- I check the package and it looks interesting. Currently, I do not have any suggestions, except maybe:
group_by()
and summarize()
as well? This could work, for example, like zonal statistics.I do not know tidyverse, but I suppose this package could be especially useful for SpatVector as these can be represented as a data.frame (tibble) that includes their geometry (as.data.frame(v, geom="WKT")
or by coercing to sf and back).
I also suppose that anything to facilitate the use of ggplot would be very helpful to many
There may be some methods that are useful for SpatRaster as well, such as for selecting layers, but I am not convinced that implementing methods like drop_na
are very useful. Adding (near) synonyms might confuse as much as help, and as it would cover a very small part of the interface, one would still need to learn the terra idiom anyway. Otherwise, your idea for how to implement it seems reasonable, and, again, I do not know tidyverse, so my opinion on utility is not that meaningful.
Thanks for your feedback @Nowosad @rhijmans , much appreciate.
Regarding your suggestion, let me expand a bit. My idea of the package is basically to extend some common used tidyverse
methods for data wrangling. The main goal for me is to help useRs with no spatial background to get started with rasters, at least with basic transformations and plotting.
I am leveraging on the idea already implemented on sf/stars
for doing so.
sf: https://github.com/r-spatial/sf/blob/HEAD/R/tidyverse.R stars: https://github.com/r-spatial/stars/blob/main/R/tidyverse.R
So I would prefer not to implement spatial operations on the package (for example, left_join.sf
would work as appending a data frame to an sf, but for spatial join a specific st_join
call is needed). It may be an exception to this with group_by.sf
, since it would merge geometries, but this is only implemented on vectors (sf
), not in rasters (stars
).
So I was not planning on implementing group_by.SpatRaster/summarize.SpatRaster()
. This is also connected with @rhijmans comment:
Adding (near) synonyms might confuse as much as help, and as it would cover a very small part of the interface, one would still need to learn the terra idiom anyway.
I would completely encourage useRs to learn {terra} idiom, as it is much more efficient than the wrappers that I could provide here. The conversion (actually implemented on {tidyterra}) data.frame > operation > back to SpatRaster is not efficient for large SpatRasters. I made some effort on that trying to add a section on the docs with the {terra} equivalent on the functions (example: https://dieghernan.github.io/tidyterra/reference/pull.html#terra-equivalent). Also I refer to this on the README: https://github.com/dieghernan/tidyterra#exclamation-a-note-on-performance
Again, {tidyterra} might be useful for beginners and/or medium size SpatRasters, but hopefully this would help to spread the usage of {terra} since it would reduce the barriers to entry. IMHO the {terra} package has an easy interface if you have worked previously with {raster}, but thinking on a completely novice with no background on rasters it can become a bit hard.
Also @rhijmans, I was thinking on removing drop_na()
at least for SpatRasters, the implementation may not be really useful. So agree on that, thanks
For conversions SpatVector/sf
I would simply advice sf::st_as_sf(SpatVector)/terra::vect(sf.object)
, this is more straightforward and CRS information won't be lost on the conversion (that I think it would happen with as.data.frame(v, geom="WKT")
Let me share with you the tidyverse
methods I have identified on sf/stars
and the degree of implementation on tidyterra
. I am putting the focus on SpatRasters so far (SpatVectors are based on sf
methods, not hard to implement really but I still didn't complete it):
✔️: Implemented
🟢: Not explicitely implemented but working on sf
🟡: To be implemented on tidyterra
package | verb | stars | SpatRaster | sf | SpatVector |
---|---|---|---|---|---|
tibble | as_tibble | ✔️ | ✔️ | 🟢 | ✔️ |
dplyr | anti_join | ✔️ | |||
dplyr | arrange | ✔️ | |||
dplyr | distinct | ✔️ | |||
dplyr | filter | ✔️ | ✔️ | ✔️ | ✔️ |
dplyr | full_join | ✔️ | |||
dplyr | group_by | ✔️ | |||
dplyr | group_split | ✔️ | |||
dplyr | inner_join | ✔️ | |||
dplyr | left_join | ✔️ | |||
dplyr | mutate | ✔️ | ✔️ | ✔️ | ✔️ |
dplyr | pull | ✔️ | ✔️ | 🟢 | ✔️ |
dplyr | relocate | ✔️ | 🟢 | ✔️ | |
dplyr | rename | ✔️ | ✔️ | ✔️ | ✔️ |
dplyr | right_join | ✔️ | |||
dplyr | rowwise | ✔️ | |||
dplyr | sample_frac | ✔️ | |||
dplyr | sample_n | ✔️ | |||
dplyr | select | ✔️ | ✔️ | ✔️ | ✔️ |
dplyr | semi_join | ✔️ | |||
dplyr | slice | ✔️ | ✔️ | ✔️ | ✔️ |
dplyr | summarise | ✔️ | |||
dplyr | transmute | ✔️ | ✔️ | ✔️ | ✔️ |
dplyr | ungroup | ✔️ | |||
tidyr | drop_na | ✔️ | 🟢 | ✔️ | |
tidyr | gather | ✔️ | |||
tidyr | nest | ✔️ | |||
tidyr | pivot_longer | ✔️ | |||
tidyr | pivot_wider | ✔️ | |||
tidyr | replace_na | ✔️ | ✔️ | 🟢 | ✔️ |
tidyr | separate | ✔️ | |||
tidyr | separate_rows | ✔️ | |||
tidyr | spread | ✔️ | |||
tidyr | unite | ✔️ | |||
tidyr | unnest | ✔️ |
@Nowosad @rhijmans
- Do you plan to add
group_by()
andsummarize()
as well? This could work, for example, like zonal statistics.
and
I do not know tidyverse, but I suppose this package could be especially useful for SpatVector as these can be represented as a data.frame (tibble) that includes their geometry (as.data.frame(v, geom="WKT") or by coercing to sf and back).
Next release of tidyterra would support group_by/summarize
for SpatVectors based on the as.data.frame(v, geom="WKT")
. Also, more dplyr methods for SpatVectors (arrange, distinct, bind_row/col, left_join/inner_join
) would be added (see #84). Instead on relying on sf
conversion I created my own process based on the as.data.frame
approach.
A quick example:
library(terra)
#> terra 1.7.18
library(tidyterra)
#>
#> Attaching package: 'tidyterra'
#> The following object is masked from 'package:stats':
#>
#> filter
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:terra':
#>
#> intersect, union
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
v_lux <- vect(system.file("ex/lux.shp", package = "terra"))
v_lux %>%
mutate(gr = cut(POP / 1000, 5)) %>%
group_by(gr) %>%
summarise(n = n(), tot_pop = sum(POP), mean_area = mean(AREA)) %>%
arrange(desc(gr)) %>%
glimpse() %>%
autoplot(aes(fill = gr)) +
ggplot2::ggtitle("Dissolved")
#> Rows: 3
#> Columns: 4
#> $ gr <fct> "(147,183]", "(40.7,76.1]", "(4.99,40.7]"
#> $ n <int> 2, 1, 9
#> $ tot_pop <int> 359427, 48187, 194391
#> $ mean_area <dbl> 244.0000, 185.0000, 209.7778
# We can control the aggregation on summarise with .dissolve
v_lux %>%
mutate(gr = cut(POP / 1000, 5)) %>%
group_by(gr) %>%
# Here, not dissolving
summarise(
n = n(), tot_pop = sum(POP), mean_area = mean(AREA),
.dissolve = FALSE
) %>%
arrange(desc(gr)) %>%
# Same statistics
glimpse() %>%
# But not dissolving aggregated polygons
autoplot(aes(fill = gr)) +
ggplot2::ggtitle("Not Dissolved")
#> Rows: 3
#> Columns: 4
#> $ gr <fct> "(147,183]", "(40.7,76.1]", "(4.99,40.7]"
#> $ n <int> 2, 1, 9
#> $ tot_pop <int> 359427, 48187, 194391
#> $ mean_area <dbl> 244.0000, 185.0000, 209.7778
Created on 2023-03-11 with reprex v2.0.2
That is very cool.
I wonder if you there are cases where you can avoid coercing the geometries to WKT or sf. That could save a lot of time. For example, you currently have.
select.SpatVector <- function(.data, ...) {
# Use sf method
sf_obj <- sf::st_as_sf(.data)
selected <- dplyr::select(sf_obj, ...)
return(terra::vect(selected))
}
But that can be done much more efficiently with
select.SpatVector <- function(.data, ...) {
d <- data.frame(rbind(1:ncol(.data)))
names(d) <- names(.data)
selected <- dplyr::select(d, ...)
columns <- unlist(selected[1,])
.data[,columns]
}
Likewise,
rename.SpatVector <- function(.data, ...) {
# Use sf
sfobj <- dplyr::rename(sf::st_as_sf(.data), ...)
end <- terra::vect(sfobj)
return(end)
}
Could be
rename.SpatVector <- function(.data, ...) {
# Use data.frame
d <- data.frame(matrix(ncol=ncol(.data), nrow=0))
names(d) <- names(.data)
d <- dplyr::rename(d, ...)
names(.data) <- names(d)
.data
}
I see that you already have something similar for rename_with
.
I could have a look at row-wise operations as well if you are interested.
Thanks @rhijmans I still have to migrate some functions, including those that you mentioned. My overall approach is to avoid as much as possible coercion between classes, so your code is exactly what I needed.
Row-wise? I didn't explore it so far, don't need to spend time on that yet, but obviously if you feel in the mood and finally you have a look please let me know.
Rowwise implemented in #92 😁
Hi, would being able to ggsave()
GeoTIFFs be a feature that's in scope for tidyterra? It's currently possible by saving whatever spatial plot you have as a normal image, reading that file back into R as a raster, setting extents and CRS on the raster, and then writing to disk again to put the raster down as a GeoTIFF. However, my experience is this is a fragile process as it's easy for axis labels, legends, ggsave()
arguments, coord_sf()
, and other things to cause extents to get confused. The result's easily that you can write 160+ MB to disk several times before you get a .tif which actually has the extents that were set on it.
Having a GeoTIFF device which automates this process seems handy for tasks centering around annotated map production. There's several questions around this on StackOverflow from folks needing to put grids or CRS ticks and such on rasters. I'm using this approach as a way of logging what spatial processing coded in R is doing in a way that's easily inspected in close detail in GIS (for example, an algorithm's output is good 98% of the time but you need to be able to scan though 40 ha at 0.5 m resolution to find the 2% cases where the code wants improvement).
Will there be a way to utilize sbar() and north() from terra within the tidyterra language?
Will there be a way to utilize sbar() and north() from terra within the tidyterra language?
Hi @stantis , that’s not possible AFAIK since sbar()
and north()
are meant to be used on base plots, while ggplot2 uses another plotting mechanism.
If you want to plot north arrows and geographic scale bars in ggplot2 you may want to use ggspatial funs (https://paleolimbot.github.io/ggspatial/reference/annotation_north_arrow.html https://paleolimbot.github.io/ggspatial/reference/annotation_scale.html) or switch completely to tmap, that has a great support for these two graphical objects (https://r-tmap.github.io/tmap/reference/tm_compass.html https://r-tmap.github.io/tmap/reference/tm_scale_bar.html).
Maybe this is better as a separate feature request - happy to move it if you think so.
First off, I just want to say that I really appreciate this package! I had developed my own clunky, extraordinarily slow way to plot RGB imagery using ggplot2
, and this package has massively improved my experience of quickly throwing together a decent looking map.
Two things I would love to see support for are stretching imagery in geom_spatraster_rgb()
(unless I'm just missing this and it is available already) and the ability to set max_col_value
(or perhaps even min and max values?) on a per-band basis.
{tidyterra} tries to improve the data wrangling and visualisation of Spat* objects ({terra}) by providing new tidyverse methods for that objects.
The result is a package with wrappers around terra functions but using tidyverse API.
Hopefully this would help useRs with no experience on spatial rasters to start working with this kind of formats. For experienced useRs tidyterra may not suit their needs, specially in terms of performance with big raster files.
The developer of the package (myself) is one of that not-so-into-raster users, so I may have taken bad decisions on the development of this package. For that reason, all the feedback you can provide is speciall useful.
Some users I can think of are obviously @rhijmans, @nowosad, @dominicroye, @paleolimbot, @milos-agathon @barryrowlingson.
So far, feedback is needed on
terra::mask() %>% terra::trim()
. Does this have sense? See https://dieghernan.github.io/tidyterra/reference/drop_na.html#spatraster