geocompx / geocompr

Geocomputation with R: an open source book
https://r.geocompx.org/
Other
1.59k stars 585 forks source link

Ideas for additional topics to include #12

Closed Robinlovelace closed 2 years ago

Robinlovelace commented 7 years ago

List of suggestions for the second edition. See below for various ideas and things already implemented (see https://github.com/Robinlovelace/geocompr/issues/12#issuecomment-609026237 for an older version of this list that includes ideas already implemented) .

Part I

Part II

Part III

Other

Robinlovelace commented 7 years ago

Include something on facets ordered by location (a bit weird but interesting): https://github.com/hafen/geofacet

Robinlovelace commented 7 years ago

mapedit: https://github.com/r-spatial/mapedit cc @tim-salabim any pointers welcome.

tim-salabim commented 7 years ago

@Robinlovelace what's the timeline for geocompr? We will likely come up with a usable first draft package for useR2017, given that I will present it there...

Robinlovelace commented 7 years ago

Fantastic. Timeline for a finished draft is summer 2018 so you have plenty of time. This will be awesome. My only feature request: easy integration with shiny, but not sure how that would work.

That would allow, for example, easy drawing of bounding polygons to subset potential cycle routes in the Propensity to Cycle Tool, e.g. for Leeds region: http://pct.bike/m/?r=west-yorkshire

Should I put in a feature request? Thanks.

tim-salabim commented 7 years ago

no need, see here https://github.com/r-spatial/mapedit/issues/21 cc @timelyportfolio

Robinlovelace commented 7 years ago

Ah the hive mind is on-it already - will keep my eyes on this with great interest.

Robinlovelace commented 7 years ago

Heads-up: this seems like a good introduction to RSAGA: https://cran.r-project.org/web/packages/RSAGA/vignettes/RSAGA-landslides.pdf

Any ideas if there is an equivalent resource for rgrass7?

Nowosad commented 7 years ago

The two best resources for rgrass7 I know of are:

  1. https://grasswiki.osgeo.org/wiki/R_statistics/rgrass7
  2. https://data.neteler.org/geostat2015/presentations/ (presentation9)
Robinlovelace commented 7 years ago

Something on activity tracking - popular source of data: https://github.com/hfrick/trackeR

Robinlovelace commented 7 years ago

Something on store location analysis (e.g. a chapter). Heads up @jannes-m I just found this: https://journal.r-project.org/archive/2017/RJ-2017-020/index.html (I've added the reference to the geocompr Zotero library).

Robinlovelace commented 7 years ago

Something on activity spaces / home ranges, e.g. with reference to aspace and Adehabitat pkgs

Robinlovelace commented 7 years ago

Something on small area estimation, e.g .with reference to https://journal.r-project.org/archive/2015/RJ-2015-007/index.html

jannes-m commented 7 years ago

That's funny, in fact we have used this HUFF-model and derivatives a lot in geomarketing. I will certainly look into this market-area-analysis paper. Haven't looked into the small-area estimation paper, but I have thought before that downscaling might be an interesting topic as well (not sure if the paper deals with that). I have just continued to write on the GIS-chapter. But now I am away for two weeks hiking with my sister in Iceland. When I come back I'll work further on the book!

Robinlovelace commented 7 years ago

Reproduce this figure in Tobler (1979):

image

Up for the challenge @Nowosad or @jannes-m ? Definitely not a priority but could be fun.

jannes-m commented 7 years ago

There's a similar representation using lattice, just using barplots (Figure 6.5): http://lmdvr.r-forge.r-project.org/figures/figures.html Or you tweak a little bit the code of the volcano figure (Figure 13.7). Yes, I know lattice has become a bit out of fashion lately. And ggplot2 combined with plotly could probably do the trick also:

https://plot.ly/r/3d-surface-plots/ https://www.r-bloggers.com/3d-plots-with-ggplot2-and-plotly/

tim-salabim commented 7 years ago

https://github.com/r-barnes/webglobe also has a similar vis

Robinlovelace commented 7 years ago

Thanks for the links. Those surface plots in plotly look lush and the fact they are interactive is an extra + so that's now my default approach if/when I get round to doing that.

Robinlovelace commented 7 years ago

Real world example: estimating south facing roof space from architectural drawing. 17_02730_FU-REV._SITELAYOUT-_EXTERNAL_WORKS-1993244.pdf

Robinlovelace commented 6 years ago

Example of 3d map from @mdsumner: http://rpubs.com/cyclemumner/rangl-plotly

image

mpadge commented 6 years ago

machine learning?

I know you're nearly there with the book and rusing to the end point, but thought for my 2 cents worth: Geomputation in 2018 might be widely expected to have some exploration of machine learning? TensorFlow is seamlessly integrated with R via Taylor Arnold's kerasR, and/or RStudio's keras. I've used both of these to explore and teach ML for spatial data, and would be happy to help if you think there may be time to slip in yet another chapter.

I will of course totally understand if you deem this beyond scope / time constraints.

Robinlovelace commented 6 years ago

Great idea, but machine learning is already included via mlr here: https://geocompr.robinlovelace.net/spatial-cv.html - any feedback on that appreciated (does it cover most important concepts to teach?).

Few people have heard of spatial cross-validation and even fewer know it's associated with machine learning so I can understand why people don't think it has ML in it. One idea that could make to topic more prominently part of the book: rename the chapter to Spatial cross-validation and machine learning. Thoughts?

gisma commented 6 years ago

few weeks ago Hanna Meyer @HannaMeyer from the Envrionmental Informatics working group in Marburg had her Phd defense which was btw awesome.... Her central methodological aspect was the spatio-temporal cross-validation in machine learning methods. she developed the concept and method to avoid overfitting by leave location out and or leave locationtime out cv and improved the model performance including a forward feature selection approach. This part is just published Improving performance of spatio-temporal machine learning models using forward feature selection and target-oriented validation. I think it is an very important issue to deal with! Hanna has impressively proved that the vast majority of studies that use machine learning techniques for prediction issues do not consider spatio-temporal dependencies so as a result they mercilessly overestimate the quality of their models. Would be fine to integrate some of this stuff if somehow possible

Robinlovelace commented 6 years ago

That would be amazing - input in that direction very much appreciated.

jannes-m commented 6 years ago

This sounds interesting. However, we are already dealing with this in the spatial cv chapter though we should reference Hanna's paper there. In any case, the point to make is that neglecting spatio-temporal dependencies will lead to overfitting, and that's why we need to account for this.

jannes-m commented 6 years ago

Regarding the ML chapter mentioned by @mpadge. I think ML is important and many people have an interest in it. In the spatial cv chapter I will use a GLM. And I first have to shortly introduce what a GLM is because we cannot expect our readers to know about it since we haven't introduced modeling techniques. Then I will present the mlr modeling interface again using the GLM but also showing that it is now problem to use any other statistical learning technique (such as random forests, support vector machines, etc.). The special advantage of mlr is that it is able to do spatial cross-validation. And I wanted to use mlr in combination with a machine-learning approach in the ecological chapter but if Marc wants to provide a chapter, I am happy to step back. In any case, I think we should have a talk soon in which we agree on how to proceed. Remember the reviewer of the second round has even proposed to finish the book after Part III, and write another book on advanced applications (which could also be an edited version with contributions from many authors).

mpadge commented 6 years ago

Sounds all covered then, and I'll leave it to you guys. Another great reference to point peeps to would be ai.google, and particularly the education section. From my teaching experiences, the biggest challenge in spatial ML is definitely getting data in an appropriate form. This is something that is not currently covered, and is even not particularly strongly emphasized in the ai.google docs. I've got simple examples of pure spatial (the easy bit, coz it's just image recognition); trajectory prediction (much harder); and time series of raster objects. Happy to provide some of my stuff; or talk more; or perhaps more sensibly just leave it, coz as indicated by aforementioned reviewer, it is already pretty darn comprehensive.

gisma commented 6 years ago

@mpadge for sure retrieving preprocessing and cleaning of data is probably the biggest challenge ever. That's the reason why getWhateverData functions are so popular that almost nobody thinks about what one is doing.... @jannes-m imho a book with author contributions dealing with advanced applications is a great idea. it will avoid overloading of your current project and could focus some cutting edge stuff..

mpadge commented 6 years ago

From the authorities: "Expect to spend significant time doing feature engineering" (google-ML-speak for data munging).

Robinlovelace commented 6 years ago

This looks like an interesting viz package: https://manubb.github.io/Leaflet.PixiOverlay/t2.html#12/48.8292/2.3476 what do you reckon @mpadge (I know you're looking at scalable interactive visualisation).

tim-salabim commented 6 years ago

Awesome! I need to have a closer look, but this seems promising!

mpadge commented 6 years ago

yeah, that's a good one, with good docs at the source www.pixijs.com/. I didn't realise it was leaflet compatible, but actually seems as simple as, L.pixiOverlay. I concur: Awesome!

HannaMeyer commented 6 years ago

Coming back to the ML and Spatial Cross validation comment by @gisma: I really appreciate that spatial CV is highlighted in the book as ignoring the spatial autocorrelation can lead to a considerable overoptimistic view on validation results but is still common practice! There are definitely still options to proceed with that topic, i.e. concerning the problem of misinterpretation of predictor variables caused by autocorrelation which in many cases causes the differences between random CV and spatial CV (see here). I implemented a feature selection (CAST) which removes misinterpreted variables with the effect of minimized differences between random and spatial CV as well as improved spatial CV results. However the code is based on the caret package instead of mlr. Therefore, including the method in the book would still require quite a bit of effort and I agree that the topic might be beyond the scope. However, I think this is an important way to go and maybe worth a joint project on predictive modelling for spatial data..

jannes-m commented 6 years ago

Hanna, thanks for your interesting input and I absolutely agree with you on the importance of spatio-temporal CV! And yes, we should join forces! I think the caret package does not use inner folds (in contrast to mlr) when optimizing hyperparameters of machine learning algorithms (but I am not entirely sure), you and @pat-s will know better.

pat-s commented 6 years ago

Thanks for pointing me to this discussion, @jannes-m!

I am happy to contribute the knowledge I've gained during the past months when I was focusing on comparing non-spatial vs. spatial CV (paper about to be submitted soon). It goes into a similar direction as @HannaMeyer paper but rather than focusing on feature selection I focus on "correct" hyperparameter tuning and the differences when using spatial data.

Having a chapter combining Hannahs, mine and all the other knowledge would be great!

(and now, we should all get some sleep :smile: )

Robinlovelace commented 6 years ago

Add description of + - * / operators for geometries.

mdsumner commented 6 years ago

@Robinlovelace do you have the actual values used for extruded heights in Tobler? It would be cute to recreate, and we could put up an interactive version somewhere - maybe a good rgl/plotly discussion, and it highlights some nice nuances about separated-vs-mesh polygons in rgl, triangulations, quad primitives, and setting material properties in rgl and so forth.

With anglr it's trivial to plot the base and top of the extruded polys, but I haven't done the walls that way yet and I want to put that in. With rgl you can give it actual polygon rings, to extrude3d and it will do this, but getting the material properties is a bit awkward.

Robinlovelace commented 6 years ago

That would be awesome @mdsumner. I looked for historic population data and you're in luck! In fact there is data that would allow creation of a 4d (by which I mean 3d + time) map with time. I've seen some awesome 4d animations with rgl but never for mapping. 2d + time below:

devtools::install_github("ropensci/historydata")
#> Skipping install of 'historydata' from a github remote, the SHA1 (910defb0) has not changed since last install.
#>   Use `force = TRUE` to force installation
library(sf)
#> Linking to GEOS 3.5.1, GDAL 2.2.2, proj.4 4.9.2
library(spData)
library(tidyverse)
#> ── Attaching packages ──────────────────────────────── tidyverse 1.2.1 ──
#> ✔ ggplot2 2.2.1     ✔ purrr   0.2.4
#> ✔ tibble  1.4.2     ✔ dplyr   0.7.4
#> ✔ tidyr   0.8.0     ✔ stringr 1.3.0
#> ✔ readr   1.1.1     ✔ forcats 0.3.0
#> ── Conflicts ─────────────────────────────────── tidyverse_conflicts() ──
#> ✖ dplyr::filter() masks stats::filter()
#> ✖ dplyr::lag()    masks stats::lag()
library(tmap)
statepop = historydata::us_state_populations %>% select(-GISJOIN) %>% rename(NAME = state)
statepop_wide = spread(statepop, year, population, sep = "_")
statepop_sf = left_join(spData::us_states, statepop_wide)
#> Joining, by = "NAME"
map_dbl(statepop_sf, ~sum(is.na(.)))  # looks about right
#>        GEOID         NAME       REGION         AREA total_pop_10 
#>            0            0            0            0            0 
#> total_pop_15    year_1790    year_1800    year_1810    year_1820 
#>            0           35           33           32           26 
#>    year_1830    year_1840    year_1850    year_1860    year_1870 
#>           25           23           18           16           12 
#>    year_1880    year_1890    year_1900    year_1910    year_1920 
#>           11            5            4            3            1 
#>    year_1930    year_1940    year_1950    year_1960    year_1970 
#>            1            1            1            1            1 
#>    year_1980    year_1990    year_2000    year_2010     geometry 
#>            1            1            1            1            0
qtm(statepop_sf, "year_1900")  # check mapping works

year_vars = names(statepop_sf)[grepl("year", names(statepop_sf))]
facet_map = tm_shape(statepop_sf) + tm_fill(year_vars) + tm_facets(free.scales.fill = FALSE)
facet_map + tm_layout(legend.outside = TRUE)

facet_anim = tm_shape(statepop_sf) + tm_fill(year_vars) + tm_facets(free.scales.fill = FALSE, 
  ncol = 1, nrow = 1)
animation_tmap(tm = facet_anim, filename = "us_pop.gif")
#> Map saved to /tmp/RtmpVKxekd//tmap_plots/plot%03d.png
#> Resolution: 2100 by 1500 pixels
#> Size: 7 by 5 inches (300 dpi)
Robinlovelace commented 6 years ago

tobler1979.pdf Original paper (this is awesome!).

Robinlovelace commented 6 years ago

Result of the final line in the reprex above - @Nowosad I think this would make a good second animation example in c9 because the years are represented in columns (unlike with the urban areas if I remember correctly). Good plan? Very close to finishing that chapter in any case and I think this will be a good addition:

us_pop

Robinlovelace commented 6 years ago

Example of animated 3d viz (not sure if this is best way, think plotly also has 3d animations and I've seen some really awesome interactive ones): https://rdrr.io/rforge/rgl/man/play3d.html

mdsumner commented 6 years ago

Cool thanks, re the rgl animation thing have you seen example(light3d) in rgl? I can get some simple "wavy ocean" animations using rgl with sea surface height data, but it's not performant enough to be compelling and I don't know how to make it so sadly, the push-up and tear-down for each frame is too costly - but this light effect suggests there's no limitation in principal.

mdsumner commented 6 years ago

Here's bare bones, absolutely key is getting the aspect ratio right - usually I would transform to metres but still we don't have a real height so it has to be fake. (I don't understand aspect3d completely yet, it's relative to previous use in a way I don't grok).

library(sf)
library(spData)
library(dplyr)
library(tidyr)
statepop = historydata::us_state_populations %>% select(-GISJOIN) %>% rename(NAME = state)

statepop_wide = spread(statepop, year, population, sep = "_")
statepop_sf = left_join(spData::us_states, statepop_wide)

library(anglr)
library(silicate)
## we only need cheap triangles, DEL is unnecessary
x <- TRI(statepop_sf)
xz <- copy_down(x, x$object$year_1900)
plot3d(xz)
rgl::aspect3d(1, 1, 0.1)
rgl::rglwidget()

Here's a live version on Rpubs: http://rpubs.com/cyclemumner/373716

It's likely that anglr and silicate are going to be unstable over the next while so, I'll try to stamp a release that we can use, apologies for any problems you have.

Note that the TRI mesh is de-normalized when we copy_down a constant value, it has to be because it goes from dense in x-y, to needing distinct coordinates in x-y-z. ( When copy_down takes a raster the copying is done per unique vertex, unique in x-y-z. This is pretty new in anglr and not well explained.)

mdsumner commented 6 years ago

Ah, also some values are NA so that prevents the state from appearing - which is correct, if not desirable :)

Robinlovelace commented 6 years ago

Interesting. Note Tobler's map looks different (with a much taller NY for example) because it's a 'bivariate histogram' in which volume ~ population. So the values would need to be rescaled to the population of each state. That looks like a great starter for 10 in any case.

Just tried example(light3d) - impressive. The light didn't seem to move smoothly if I rotated the view though. Any ideas? Will be awesome to have a 3d animation of US states in any case and seems the example is moving in that direction.

mdsumner commented 6 years ago

I assume the animation movement and nav movement is not separable, and I doubt rgl in current form could do that (but I don't really know, it's improving steadily and I don't know much at that level).

Thanks for pointing out the scaling, I knew it was wrong but glad it's just some arithmetic! It's made me realize that silicate / anglr is actually total overkill for this kind of plot, because the features really are separate (in x-y-z) and we can simply hack st_coordinates and rgl to build it, including quad-walls. I'm a bit concerned that z-fighting will spoil it though. It would be easier to explain and share though, so I'll try it.

mdsumner commented 6 years ago

Phew, this is rough and I'll be happy to help smooth over if it's of use! No proper scaling yet.

It's pretty slow due to the triangulation in rgl, decido is faster but extrude3d adds the quads already. (That's not hard but it will definitely take me a few stabs to get right, it's actually a good and exactly-right extension of the edge-focus used in dodgr, I've just realized).

library(sf)
library(spData)
library(dplyr)
library(tidyr)
statepop = historydata::us_state_populations %>% select(-GISJOIN) %>% rename(NAME = state)

statepop_wide = spread(statepop, year, population, sep = "_")
statepop_sf = left_join(spData::us_states, statepop_wide)

## hack into sf and treat each polygon ring separately
## (this would respect holes, with a little extra work but doesn't now)

coords <- as_tibble(st_coordinates(st_transform(statepop_sf, 2163))) 
coords$Z <- statepop_sf$year_1900[coords$L3] ## not a general solution, ignores scaling too
coords$ring <- paste(coords$L1, coords$L2, coords$L3, sep = "-")
coords <- coords %>% dplyr::filter(!is.na(Z))

library(rgl)
scl <- function(x) scales::rescale(x, to = c(0, 1000), from = range(coords$Z, na.rm = TRUE))
polygon_extruder <- function(x, ...) {
  rgl::extrude3d(x$X, x$Y,  thickness = scl(x$Z)[1],  ...)
}
rings <- unique(coords$ring)
#rgl::rgl.clear()
for (i in seq_len(nrow(statepop_sf))) {
  d <- coords %>% dplyr::filter(ring == rings[i])
  x <-polygon_extruder(d)
  rgl::shade3d(x, col = "white") ##sample(viridis::viridis(100), 1))
  }
rgl::rglwidget()
Robinlovelace commented 6 years ago

Something on transformr, which now works with sf objects:

image

Source: https://twitter.com/thomasp85/status/982223090561179648

Robinlovelace commented 6 years ago

Implement this in R: https://blog.mapbox.com/a-new-algorithm-for-finding-a-visual-center-of-a-polygon-7c77e6492fbc

Robinlovelace commented 6 years ago

Show-off geom_relief() - see #224

Robinlovelace commented 6 years ago

Create a cheatsheet: https://github.com/Robinlovelace/geocompr-cheatsheat

Agreed @Nowosad @jannes-m ? Will add to tickboxes and use this approach if so.