[x] 1.1 [JN, RL] Section 9.2 – I agree with the authors that 'tmap' is appropriate as an introduction to cartography in R, having the least steep learning curve for production-quality standard maps. However, it is also relatively narrow in scope and specialized. When something non- standard is necessary the user will have to fall back on more general visualization approaches. It can be mentioned that 'ggplot2' and 'base'/'grid' provide a lot of flexibility which is sometimes necessary to produce advanced or novel map designs, 'ggplot2' through its powerful syntax and huge ecosystem and user community; 'base'/'grid' through their "canvas drawing" style of flexibility. Someone who wants to produce complex and novel types of maps will sooner or later need to go deeper into one of the latter (or both). To demonstrate this point, the authors may also consider including some inspirational examples of advanced maps produced in R, such as - ◦ http://blog.revolutionanalytics.com/2012/02/what-are-the-most-popular-bike-routes-in- london.html ◦ http://spatial.ly/2017/04/population-lines-how-and-why-i-created-it/ On the other hand, 'tmap' makes it easy to add essential elements such as a north arrow and map scale, which is less straightforward in general-purpose visualization packages.
- We agree that the diversity of thoughts (and visualization packages) is very important. Thus, we introduce base graphics in several places in the book as well as describing it together with ggplot2 in the "Other mapping packages" section. Base graphics provides a lot of flexibility, but its syntax is not very intuitive and therefore its proper description is outside of the scope of this book. On the other hand, the flexibility of ggplot2 and tmap in terms of map making is rather comparable. Both are based on the grid package and have certain limitations. We give some examples of the ggplot2 limitation in the book, e.g. https://github.com/tidyverse/ggplot2/issues/2037 or its weak raster support. Finally, there is a visible inconsistency of ggplot2 in terms of presenting spatial data. geom_sf() behave differently from the rest of ggplot2 geoms, while many "spatial" ggplot2 extensions are just quasi-spatial (e.g. they do not care about CRS, etc.).- Another justification of our emphasis on tmap comes from prior work. Paul Murrel has a book on visualisation based on graphics and grid packages. Ggplot2 has entire books dedicated to it. Both are referenced in the vis chapter. We think deliberately adding new content and documenting tmap beyond (the admitedly very good) documentation of the package itself is a further strong reason, beyond those outlined above. The approach is to link to existing resources and guide people to good packages as much as possible while minimizing duplication of material covered in existing books.
[x] 1.2 [RL] Section 9.2.1, 1st code section – What type of object is 'nz'? Is it an 'sf' layer? Maybe clarify right from the start what type of raster and vector classes the 'tmap' package supports.
- Agreed - we should not assume readers know, or can remember, what nz is at this stage. This and some associated issues are addressed in https://github.com/Robinlovelace/geocompr/commit/888c7d9d6a1b431355ee29042b3e260790cd88cd making the section easier to understand for people who start on this chapter.
[x] 1.3 [RL] Section 9.2.3 – What happens when there is a column named the same way as a fixed setting? For example, suppose 'dat' has a column named "red", what does the expression 'tm_shape(dat) + tm_fill(col = "red")' do?
Good question. The answer can be seen in the below reproducible example. The text has been updated to clarify that tmap searches the column names first.
[x] 1.4 [RL] Section 9.2.3 – I'm not always getting the behavior shown in the left panel of Figure 9.4. For example, running the following code - library(sf) nc = st_read(system.file("shape/nc.shp", package="sf")) plot(st_geometry(nc), col = nc$AREA) Produces a plot with no fill colors - [EXAMPLE IN THE PDF] Maybe you mentioned it elsewhere, if not then perhaps it will be helpful to say what kind of symbology does 'plot' produce when given a geometry and aesthetic mapping to numeric/character/units etc. data types.
Interesting. I was surprised to see the result mentioned here and it took some experimentation, illustrated in the reprex below, to find out what was going on. It seems continuous colour fields are only created when a variable is named, otherwise it defaults to categorical variables based on integer values. This is now mentioned in Chapter 2.
library(sf)
#> Linking to GEOS 3.5.1, GDAL 2.2.2, proj.4 4.9.2
nc = st_read(system.file("shape/nc.shp", package = "sf"))
#> Reading layer `nc' from data source `/home/robin/R/x86_64-pc-linux-gnu-library/3.4/sf/shape/nc.shp' using driver `ESRI Shapefile'
#> Simple feature collection with 100 features and 14 fields
#> geometry type: MULTIPOLYGON
#> dimension: XY
#> bbox: xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
#> epsg (SRID): 4267
#> proj4string: +proj=longlat +datum=NAD27 +no_defs
old_par = par(mfrow = c(1, 3))
plot(st_geometry(nc), col = nc$AREA)
plot(st_geometry(nc), col = nc$AREA * 10)
plot(nc["AREA"], key.pos = NULL)
par(old_par)
[x] 1.5 [RL] Section 9.5 – To make it clear what exactly constitutes the 'app.R' file for the 'lifeApp' example, I would include into the code section the 'library' expressions for loading the 'shiny' and 'leaflet' packages, and the expression to load the 'world' layer.
[x] 1.6 [RL] Section 9.5 – The authors can add that, aside from dynamic queries of data to put on the map, one of the most powerful features of the shiny/leaflet combination is exploiting map events. For example, the map can be programmed to respond to mouse clicks by showing distance to some features, drawing new layers, querying information in ways other than a popup, and so on.
[x] 1.7 [RL] Section 9.6 – Regarding 'plot' with 'sf' and 'raster' layers, perhaps mention that combining them works when both packages trigger drawing into the same plot area and do not include interfering elements. For example, plotting an 'sf' geometry and a single-band raster (as shown in the example) works well, but other combinations (e.g. multi-band raster) will not work.
[x] 1.8.1 [JM] Section 10.1 – Consider including a brief summary on the types of tasks that each program brings, which are not currently available in R. This is given in general terms in section 10.4 but could be expanded. For example, terrain analysis will probably be done in SAGA, network analysis in GRASS, etc.
- We have expanded the first paragraph of the corresponding section as follows: "As mentioned previously, SAGA is especially good at the fast processing of large (high-resolution) raster datasets, and frequently used by hydrologists, climatologists and soil scientists. GRASS GIS, on the other hand, is the only GIS presented here supporting a topologically based spatial database which is especially useful for network analyses but also simulation studies (see further below). QGIS is much more user-friendly compared to GRASS- and SAGA-GIS, especially for first-time GIS users, and probably the most popular open-source GIS. Therefore, RQGIS is an appropriate choice for most use cases."
[x] 1.8.2 [JM] Perhaps also replace the QGIS example (counting points in polygons) with a QGIS algorithm that is not available in R, or an algorithm that is much more complicated to apply in R. Otherwise unaware users may wonder why go through all the setup trouble if the same thing can be done in R. The SAGA and GRASS examples are very good in this respect, as both of them show something that cannot be easily reproduced with R alone.
- Thanks for the advice, and we agree. However, native algorithms are not the special strength of QGIS. It shines more in bringing together geoalgorithms from various Desktop GIS in a single interface. Nevertheless, we now present how to get rid off sliver polygons using (R)QGIS, something which is not easily done using just R.
[x] 1.9 [JM] Figure 10.2 – What do the line colors mean?
- The ID column was colored, which is nonsense. I have removed the coloring, thanks for spotting!
Chapter 11:
[x] 1.10 [JN] Section 11.2 – Same comment as previously given for the earlier chapters: it may be confusing to deal with both 'Spatial' and 'sf' classes. This situation is inevitable in the R ecosystem at the moment, but needs to be given more emphasis in the book. For example, why does 'crop' work only with 'Spatial' layers, while 'mask' works with 'sf' contrariwise to its documented types of acceptable arguments? (Using 'showMethods("mask")' I now see that 'mask' indeed has an 'sf' method, but it is undocumented). I would actually expect 'mask' to be the less flexible function, given it only needs the rectangular extent. In any case, this situation needs a few words in my opinion. Additionally, the authors may consider adding a comprehensive table specifying which 'raster' functions currently work with 'sf' and which ones require the conversion to 'Spatial'.
- Every vector object in now converted to Spatial in raster functions. We also added a text block recommending using as(sf_obj, "Spatial") inside of the raster functions in Chapter 2.
[x] 1.13 [JN] Section 11.3 – The code section preceding Figure 11.3 is not working for me - transect_df$dist = c(0, cumsum(geosphere::distGeo(transect_coords))) Error in .pointsToMatrix(p2) : argument "p2" is missing, with no default Anyway, the following expression perhaps can be an alternative - transect_df$dist = geosphere::distGeo(transect_coords, transect_coords[1,])
- This code works in our local machines as well as in the online build. Is it possible that you have out-dated version of the geosphere package? Your alternative code works well only in cases when the line is straight. Our code should work well for non-straight cases as well.
[x] 1.14 [JN] Section 11.3 – The authors can mention that the necessity for returning a 'list' in raster extraction from lines/polygons is that the features overlap with variable amounts to raster cells, whereas each point always corresponds to a single raster cell (points exactly on the border between two cells are assigned to one cell or the other).
- Thank you. We have added a sentence - https://github.com/Robinlovelace/geocompr/commit/c46a13b9d1fa3cc3d2d88835503114714ad32ac2.
[x] 1.15 [JN] Section 11.3 – The authors can mention that when one needs to get a vector of summarized line/polygon values per feature, using just one function, then the 'fun' parameter of function 'extract' can be a quicker way than extracting all values and using 'tidyverse'.
- There are some limitations of the above approach. Firstly, the output needs to be a single value when the fun arguments is used. Secondly, it is not distinctly quicker - the extraction process is the most time consuming one. The only actual advantage (that I can see) of using fun is it requires less code to write.
[x] 1.16 [JN] Section 11.4 – To complete the discussion of choosing rasterization resolution, it is worth mentioning that sometimes the template resolution is dictated in advance: for example when the set of layers to be aligned already contains at least one raster, or when interpolating into a required area of interest with a predetermined resolution, and so on.
- Thank you. https://github.com/Robinlovelace/geocompr/commit/0f23abd78d5ae5a24113ba66d44d6285634ce68a
[x] 1.18 [JM] Section 11.5 – Text says "Contour lines can be created with the raster function rasterToContour() , which is itself a wrapper around contourLines() , as demonstrated below:", but there is no code example for function 'rasterToContour'. Perhaps an example of 'rasterToContour' may be given, followed by 'plot', rather than just using 'contour' which calculates the contour lines "behind the scenes".
- Changed as recommended.
[x] 1.19 [JM] Section 11.5, Figure 11.8 – The best contour labeling algorithm is probably in the 'contour' function, so I would use it in the visual example instead of 'tmap'. General comment: a nice- looking plot such as Figure 11.8 with no code is perhaps less useful than a simpler plot given with reproducible code. For example, the basic plot produced by 'plot(r); contour(r, add=TRUE)' may be a more tangible example.
- Changed as recommended.
[x] 1.20 [JN] Section 11.5, Figure 11.8 – Using 'dissolve=TRUE' in 'rasterToPolygons' gives the same "Aggregated polygons" result shown in the figure, but with shorter code.
- We are aware of this argument. However, our goal in this paragraph was to show the readers (1) how polygonization works, (2) how to obtain disaggregated and aggregated outputs. Nonetheless, we added a short information about this argument - https://github.com/Robinlovelace/geocompr/commit/a85a4810ab3fb1d377d985b09218f3aa46194a8a
Chapter 13:
[x] 1.21 [JM] Section 13.2 – To make the landslide dataset "balanced", in the given example the non- landslide points are sampled to give similar positive and negative sample sizes (175 vs. 175). Intuitively this seems wrong: the sample should either be completely random (e.g. sampling random points inside the study area, then checking them for landslides) or comprising the entire population (completely mapped study area). Why not divide the study area into a grid (i.e. raster), where samples are grid cells and have the response variable specify whether the cell contains (at least one) landslide, assuming complete mapping? In other words – it is unclear to me what is the rationale behind keeping "balance" between sample size of the two categories, at the expense of randomness. Few years ago I did a study on forest mortality and made the above-mentioned decision. The dataset was similar in structure: there are tree mortality locations scattered around the forest. Similarly to the landslides example, there were quite few mortality sites and a lot of non- mortality sites. However, I thought that instead of making the data balanced, the entire range of conditions in the forest needs to be characterized. So I ended up dividing the forest into a regular grid, and having each grid cell sampled as TRUE/FALSE, i.e. whether the grid cell had at least one mortality location inside. The data indeed were unbalanced, with just ~3% of cells being TRUE. (The data then went to a Mixed-Effects GLM considering spatial autocorrelation). Similarly, in presence/absence modelling of species distributions: I would think that entire valid dispersion range (e.g. a continent) needs to be considered as the "sample", with all cells given TRUE/FALSE values according to presence/absence of the species in that cell, rather than choose N observed locations and N absent locations. I believe that spatial logistic regression models (e.g. package 'spatstat') use a similar rationale: the spatial window is considered completely mapped, then the point pattern of observed events is modeled as function of background (e.g. raster) predictor also available for the entire area. Admittedly, I have no theoretical justification, it just intuitively seems more correct. If I'm missing something and the authors have any comments on this I will be very happy to learn.
-Thank you for sharing your experience and the feedback! And I see, why our approach might be a bit confusing. Let me give you a bit more detail on the background. In the original study (https://doi.org/10.1016/j.geomorph.2011.10.029), we mapped all landslides in the study area. Secondly, we used the mid-point of the landslide scar as the response variable. Thirdly, we randomly sampled points which were located outside of the landslide polygons. We selected as many non-landlides points as there were landslide points since logistic regression using a log-link roughly expects a balanced response (see e.g., http://highstat.com/index.php/beginner-s-guide-to-glm-and-glmm). However, the same dataset has been also used in classes and summer schools to show how different modeling algorithms can handle the uneven distribution of the response variable and the related increase in spatial autocorrelation. This is why 1360 non-landslide points where randomly selected but again this was done by randomly selecting these outside of landslide polygons. To clear this up, we have added this explanation to a footnote.
To answer your question on dataset completeness. We do not need the entire population but just a representative sample of it for statistical modeling. In fact, using the whole population would make modeling redundant because you already have the information for each pixel. But you are right that some spatial point pattern algorithms expect a full inventory, e.g. all trees (see the spatial point pattern analysis chapter of the ASDAR book.
[x] 1.22 [JM] Section 13.5.1 – In the first code section, the expression 'coords = lsl[, c("x", "y")]' is duplicated.
-Changed! Thanks for spotting!
[x] 1.23 [JM] Could be helpful to comment on model interpretation in GLM vs. SVM. For example, how can we interpret the coefficients in GLM? How can we calculate variable importance in machine learning methods, such as SVM?
-As emphasized in the introduction, this chapter is on spatial predictions using machine learning. Still, you are right that one can compute variable importance of ml algorithms but we better refrain from it here since otherwise we would have to add another section on the difference of statistical inference and variable importance in machine learning.
Additional questions:
[x] 1.24 [RL] The idea of an "Algorithms" chapter is great. From the chapter draft it seems the example (centroid) is relatively complex though. Also seems like part of the solution will be using an external package 'decido'. I would suggest replacing or supplementing this example with a simpler one, where the solution only requires functions covered elsewhere in the book, such as functions from the 'sf' package only. Just one of many possible suggestions – a function that accepts a polygonal layer and returns a new line layer containing separate line segments (as separate features) along with the azimuth where the exterior of each line segment is facing. This can be useful in urban planning, e.g. to understand how building facades are oriented. Such a function needs several steps: first splitting the polygons to line segments, then extracting each segment coordinates (x1, y1, x2, y2), and finally applying some trigonometry and math to find the azimuth. This does not require anything outside of 'sf' plus base R. Here is an illustration of a sample layer and the resulting azimuth classification - [FIGURE IN THE PDF]
- The chapter was at an early stage so we're very grateful for the high level comments and the specific suggestions. Based partly on these suggestions, the chapter has evolved substantially. We are confident it is much better and addresses each of the issues raised. See https://geocompr.robinlovelace.net/algorithms.html . Regarding the centroid example, we agree the description was a little long-winded. Based on this comment we've re-written the algorithms section. It no longer depends on decido (or purrr for that matter). It's shorter. And hopefully provides more boom for your buck!- Regarding the decision to use centroid, we defend that decision. It's something everyone can related to. It's now visually very clear what's going on with the new 3 facet plot - see https://geocompr.robinlovelace.net/algorithms.html#fig:polycent - that should be satisfying. More importantly it provides a reason to link to source code of industrial-grade implementations such as GEOS. That was always an intention: to make the reader aware that hard work has gone in to mean they don't have to write all their algs from scratch and that now comes through in the prose also hopefully. Any further feedback on this very much appreciated - we have a version that works even for non-convex polygons for example - worth mentioning?- We agree with your suggestion for a function to 'explode' linestrings representing building walls. We'll work on it.
reviewer two
Chapter 12:
[x] 2.1 [RL] In that chapter, section 2, the "geographic algorithms" are not so much geographic as they are geometric. The triangle example is way too long, or better: provides to little bang for the paragraphs spent on it.
Both Stephen Wise's GIS Basics as well as Xiao's book are a good start to look for examples. In the applications areas, I would recommend crime and landscape ecology as nice examples for a whole new set of algorithms that have been well specified for beyond the realm of basic GIS analysis. Given the title of the book, I would have to think of something "fuzzy", some linear programming for optimization, spatial filtering (Griffith & Chun come to mind), a brief overview of neural network libraries with some classification example (this could then be merged with the ML section in the last chapter), a space syntax example like Benjamin Acker at UT Texas, and of course, something evolutionary that would get us into spatial decision support systems.
The problem with both the GeoComp and the final statistical learning chapter is that a proper treatment of the subject matter requires a book by itself. I raised this issue in a previous review of this book.
We have acted on these suggestions, but have avoided the more advanced 'fuzzy' topics. We think it's more important to teak strong foundations than shaky skyscrapers. Perhaps this content will stimulate follow-on books that are more advanced.
Chapter 9:
[x] 2.2 [RL] As tmap is based on ggplot, I deem it necessary to introduce the underlying philosophy of ggplot first.
- We now mention the Grammar of Graphics on which both tmap and ggplot are based - see https://github.com/Robinlovelace/geocompr/commit/7ec9456191e8f5b2701eaac18c4ed81683ed05f6 . I am not an expert in this field of visualization so any follow-on comments or suggestions welcome here (RL)
[x] 2.3 [JN] 9.2.2: There is no dataset nz_elev
- The nz_elev dataset is in the development version of the spDataLarge package.
[x] 2.4 [JN] 9.2.3: There is no variable Land_area. The plot command therefore does not work. Replace with existing variable name LAND_AREA_SQ_KM.
- This variable exists in the development version of the spData package.
[x] 2.5 [JN] There is no variable Population nor Median_income. As, in this case, there are no such variables in the data set, none of the code in 9.2.4 works.
- This variable exists in the development version of the spData package.
[x] 2.6 [JN, RL] Section 9.2.5 has to be re-written to accommodate the way that tm_style works now. None of the +tm_style commands work. This includes the concluding pointer to the style_catalog function
- We present how the tm_style function works starting from tmap version 2.0. It should be on CRAN soon.
[x] 2.7 [JN] In section 9.2.7, the call to the grid() library seems innocuous. However, the subsequent call of the viewport function depends on it. This is hard to understand for the uninitiated. The last paragraph in section 9.2.7 is meant to be a transition to section 9.3 but what is confusing is that the paragraph actually refers to all of 9.2 rather than just the last three pages.
- We state that a viewport from the grid package will be used in the text. What else do you recommend? The last sentence of section 9.2.7 is meant to show readers that the examples in this section could be improved (e.g. with a better style) and the method presented can be applied not only to combine maps, but also to combine maps and plots.
[x] 2.8 [JN] Section 9.3 relies on an older version of tmap. The animation function is now called animation_tmap and assumes the installation of a software package external to R (ImageMagick). That is already problematic. With the change in the function call, I also get a "convert -version" error.
- Actually, this section relied on the newer version of tmap. Some of the function names were standardized, including the change from animation_tmap to tmap_animation.
[x] 2.9 [RL] The world coffee code does not work either: the layer definition for facets requires a true/false value. Mapview, leaflet, and shinyApp, however, work just fine.
This has been resolved. We would be grateful if the reviewer could confirm this based on up-dated packages.
Chapter 10:
[x] 2.10 [JM] In the GRASS section, the command writeVECT(SDF = as(points[, 1], "Spatial"), vname = "points") fails with the error message that the pointer does not point to a geometry column
- I have run the code again, and in my case it doesn't fail, could you please try again after having run data("cycle_hire", package = "spData"); points = cycle_hire[1:25, ].
Chapter 11:
[x] 2.11 [JN] In Chapter 11, it would be nice if the code examples would include the plots, esp. since the non-map plots are a little removed from the reader's memory.
- Thank you for the suggestion. However, each map in this chapter was created with several lines of code, and therefore it would take too much space in the book. All the code is freely available at our github page.
Chapter 13:
[x] 2.12 [JM] In Chapter 13, it is fine to leave some exercise calculations up to the reader but not to then rely on those results in subsequent code snippets like in the glm model at the beginning of section 13.3.
- You are completely right. Still, we don’t have the space to show the complete code for extracting the predictor variables. As a compromise, we have added the resulting predictor dataset to the spDataLarge package. Hence, the reader does not have to do the computations but can rather attach the required dataset and go on with the remaining analysis.
[x] 2.13 [JM] The summary command on the 500 sample runs looks very different for me (everything is 0.5).
- This is strange, an AUROC of 0.5 would mean, that we could have flipped a coin as well, i.e. no discrimination of the classes at all. I have run the code again, and retrieved the results as reported in the book. Can you please try again?
[x] 2.14 [JM] The authors forgot to require the installation of the kernlab package
- kernlab is now a prerequisite. Thanks for spotting!
[x] 2.15 [JM] The machine learning section is, in general, good. But it does require the reader to have gone through a graduate level (spatial) statistics course - which minimizes the size of its potential audience.
-Yes, you are right, this is a hard-read for stats novices. However, this book is on geocomputation with R. And the greatest strength of R is probably its unparalleled pool on data science/machine learning algorithms. Combining these with geographic data is hence a unique feature of R. And even if some readers might have a hard time reading this chapter for the first time, it still provides them with an idea what is possible with R and geographic data.
[x] 2.16 [JM] Towards the end of the chapter, my runtime was 0.7 seconds (on a six-year old machine and in contrast to the authors' 37.4 seconds)
- In fact, it is 37.4 minutes. Maybe something went wrong when you were running the code. I have run the code again, and everything worked as expected. So I can only ask you to please try it again.
reviewer three
Chapter 12:
[x] 3.1 [RL] here’s only a little in the book about obtaining data, and that is one of the most important first steps for developing algorithms, and sometimes is the most painful. I think it’s worth saying in 12 that having a strategy for where the data is and how you will get access to it is very important. Obviously (to experienced programmers) getting all the data in order and having a way to maintain and keep it up to date is an important, independent task - and is where geospatial often meets “more general” tasks like downloading files, organizing folders, building database of file names, and so on. Arguably this is a task best provided by a user’s institution or advisors, but at least seeking if anyone local already has this data is a good step. This might be a good place to provide some advice and links to resources on that. (We use ropensci/bowerbird as framework for a lot of our work, it’s not a good beginners tool but it’s an example of total separation of data-getting from data-usage).
- Chapter 7 is dedicated to data I/O and provides many links and examples showing how to obtain data. Agree tool development depends on access to data but I'm not sure how beneficial another section will be in a chapter on algorithms (unless I'm mis-interpretting what the reviewer means by 'obtaining data' - are you referring a minimal example or full datasets?). For the time being this comment is addressed at the outset of the chapter which clarifies its assumption that the reader already has a good understanding of their data and is able to access it in a reproducible manner, e.g. using guidance from Chapter 7:
The chapter assumes you have an understanding of the geographic data classes introduced in Chapter 2, and have already imported the datasets needed for your work, for example from sources outlined in Chapter 7.
Chapter 9:
[x] 3.2 [RL] This is a very good overview, it’s not only very practical in showing how to do things with these packages, it also gets across what a diverse and dynamic space the R spatial landscape is. I’m generally just reading it and learning, which is terrific! It’s a long chapter but for good reason, lots of strong ideas and good useable examples.
I find this does lack a little bit of the big picture though, why is the space so fragmented, why are there so many mapping packages, and sooo many animal tracking packages? Is it because R is very flexible and dynamic and a lot of great ideas are happening fast, is it because it’s just how it is in geospatial? Are there prospects for tighter unity or should we expect more and more packages? How does the R community compare to the QGIS community for mapping? Why is QGIS and ArcGIS more about Python than R? A few thoughts on some of these questions might be useful.
We have added more context in the mapping chapter and made it more inter-linked with the rest of the book. Some of these questions are tackled in chapter 1 and the new chapter 15. Any further comments gratefully received.
[x] 3.3 [JN] Section 9.1 typo “are often be the best way”
[x] 3.4 [JN] Section 9.2 typo “and the grid providing” " is making static with tmap."
[x] 3.5 [JN] Section 9.2.3 typo “The purpose this section is to show how”
[x] 3.6 [JN] Section 9.2.3 typo " for to create superscript text)"
[x] 3.8 [RL] Section 9.2.6 Is “small multiples” defined in the lit somewhere? A ref? I see the term but it’s possibly an ESRI thing? I’ve tended to use “conditioned on” (a grouping value) in the past, but can’t remember where I picked that up - faceted is a very good term.
**We have cited a recent, open access and high quality paper to introduce the term: "Faceted maps, also referred to as ‘small multiples’, are composed of many maps arranged side-by-side, and sometimes stacked vertically (Meulemans et al. 2017)." - see https://geocompr.robinlovelace.net/adv-map.html#faceted-maps
[x] 3.9 [RL, JN] Section 9.3, 9.4 This is excellent, very good description of the need for various modes of faceting and animation. Might mention the lack of an overarching framework for continuously varying data in R and animations, and while magick, animation, moveVis, and gganimate and others provide frame-based animation there is a longer-term effort to make continuous transitions more generally available (tweenr, new gganimate).
We've updated the prose as follows:
There are many ways to generate animations in R, including with the popular [gganimate](https://github.com/thomasp85/gganimate) package, which provides a "grammar of animaged graphics" (see section \@ref(other-mapping-packages) for more on mapping with ggplot2).
Only tmap, to the best of our knowledge, provides a framework for animated maps, hence the focus of this section on animation_tmap().
[x] 3.10 [RL] Section 9.5 For absolutely up-to-date-ness it might be worth mentioning the new “async” facilities for shiny, and how that will help make shiny more acccessible and useable because of better responsiveness and scaleability.
We are not aiming for up-to-dateness or scalability, just to show what is possible. If anything the detour into mapping applications may already be longer than necessary. So we have not included this.
Chapter 10:
[x] 3.11 [RL] Note that CLI has a very specific meaning and R’s console/REPL is not really that. In a Python package you literally write a scripts configuration so that your functions are available at the system CLI, so this does have a subtly different meaning in that context - familiar to QGIS programmers, and might be worth clarifying. Consider including fortunes::fortune("SUV") to complement the quotation from Gary Sherman. :)
- Thanks for noting. We have discussed this at length and agree that clarification was needed on what CLI means. Instead of the narrow definition, we have opted for a broad definition following a wikipedia article on the subject: https://en.wikipedia.org/wiki/Command-line_interface - see https://github.com/Robinlovelace/geocompr/commit/93fb19d0097496d5d8050e403dd98afeadae3e8e . We enjoyed the quote and have included the SUV example by Greg Snow. Thanks for making us aware of it!
[x] 3.12 [JM] The term “coupling” is mentioned in the note about the term “bridge” at the end, but I think this is an important term and should be mentioned at the start here.
- We agree, and therefore, we have put the corresponding footnote into the main text.
[x] 3.13 [JM] There’s no direct mention of SQL here, very familiar to Oracle, Manifold GIS, PostGIS, GDAL-OGR, Spatialite and now QGIS users, and pervasive in modern SQL Server (which includes R- Revolutions!) and other applications. It’s probably way too much to include a spatial SQL treatment in geomcompr but I think it should be mentioned, it’s very strong complement to CLI and programming in general for spatial and external applications can be easily invoked via SQL which means there’s a lot of promise for future coupling/bridges as well as DIY-potential with so much flexibility.
- We absolutely agree! In fact, we have long been discussing if we should include a section on big data analytics which basically boils down to querying data from a database living in the cloud. Instead of doing this, we now have written a short subsection on spatial DBMS via PostgreSQL/PostGIS (https://geocompr.robinlovelace.net/gis.html#postgis). This already allows you to handle large spatial data, and if it becomes even bigger, the step to GeoMesa and related tools is not too big.
[x] 3.14 [JM] he first section here needs tightening up, the dotted lists are a bit ad hoc and some are too long, and its incomplete - I’m sure the authors are aware. Otherwise I think it’s very good to have each of these external applications listed with key details about using them. It’s very practical and I don’t know of any other resource that has this balance of detail and coverage, so it’s very helpful.
- Yes, you are right, the bullet points were still work in progress. We have rewritten them. Though it was not our intention to provide a comprehensive list of advantages/disadvantages of programming vs. GUI, please let us know if we are still missing decisive points.
[x] 3.16 [JM] I have now read through chapter 13 and have only positive feedback. I think it's really well-written and explained - the modelling is not my area, and I can see this is a very accessible and practical section for introducing folks to these powerful techniques. There's some tiny things like spelling: modelling/modeling needs to be consistent, but I'm sure a later review will find those - and if I have more I'll submit to the github process itself.
-Thank you for the kind feedback! I have deleted the spelling mistakes.
reviewer four
General remark:
[x] 4.1 [RL, JN, JM] Minor general remarks: I notice that you use the "=" assignment throughout the book. You may be aware that within the R community the general consensus is to use the "<-" operator. Personally, I do not care, but for the vast majority of R users who are used to "<-", it might be confusing.
We all use = in our own work and have decided to use it for ease of teaching. We have discussed changing it and would be open to doing so if a strong case can be made. It is certainly worth mentioning, however.
Chapter 9:
[x] 4.2 [RL] Section 9.2 "... and the grid providing functions for low-level control of graphical outputs" Do you mean the "grid package"? If so, you may note that base-R graphics differs from grid-based graphics. Base-R graphics can be used for charts and maps (i.e. the plot function), but the grid graphics only offer graphical building blocks (like grid.rect and grid.text), which are not very useful for the end-user.
We have udpated the text to say "The base R approach is also extensible, with plot() offering dozens of arguments.
Another low-level approach is the grid* package, which provides functions for low-level control of graphical outputs, --- see R Graphics* [@murrell_r_2016], especially Chapter 14."
[x] 4.3 [RL] Section 9.2 Note that the vignettes will be reorganized in version 2.0 (and also renamed), so these links may not work after releasing tmap 2.0. You can also refer to "the vignettes listed in https://cran.r-project.org/web/packages/tmap"
We have updated the urls
[x] 4.4 [RL] (? tmap-element --> ?'tmap-element' (Also watch the difference between ` and ', otherwise it won't work when copy-pasting)
[x] 4.5 [RL] Small detail on "adding a border on top of the fill layer.". Normally, the order of layers represents the plotting order, but polygons are drawn only once. The different layers tm_fill and tm_borders are just to let users be able to specify borders and fill separately. In other words tm_borders() + tm_fill() gives the same result.
We have removed the sentence "note the tm_borders() call aftertm_fill() + in the previous code chunk." - agreed but that is too subtle to be worth mentioning - the order of layers is the main point.
[x] 4.7 [JN, RL] Figure 9.13. In order to prevent occlusion, you could increase the figure a little, and try to add white borders. For that, you'll need tm_symbols rather than tm_dots, since it has a different shape (symbol 21 instead of 16).
data(World)
data(metro)
qtm(World, projection = "longlat") + tm_shape(metro) + tm_symbols(col = "black", size = "pop2020", border.col = "white", scale = 3)
- Thank you for this suggestion. We have implemented it.
[x] 4.8 [RL] As of version 2.0, using tm_view(basemaps = basemap) will be deprecated and replaced by
tm_basemap(server = basemap). Users can specify the default basemaps with tmap_options.
[x] 4.9 [RL] Third paragraph: shiny offers a solution to the first and third limitation, but not the second one (scalability of large datasets).
Agreed. I think the section is now re-worded to focus on shiny's capabilities rather than scalability
[x] 4.10 [RL] Inset paragraph on shiny: note that copy-pasting runApp(“coffeeApp”) will not work, since the double quotes are different than ".
**Fixed with block2.
Chapter 10:
[x] 4.11 [JM] 10.1. You write that find_algorithms can be used to search for algorithms. When I use new software, I'd rather like a table of contents to see what is available. Isn't there one for QGIS somewhere online?
- Just after using find_algorithms() for the first time, we also write that when using it without specifying anything, it will return all available QGIS geoalgorithms including a short description. Still, we have added a link to the online documentation of all QGIS geoalgorithms since this provides a better arranged overview as you correctly point out. Thanks!
[x] 4.12 [JM] 10.4 "QGIS only provides access to a subset of GRASS and SAGA functionality." It's not really clear which GIS package offers which set of functions. Reading just this sentence, this would suggest that GRASS and SAGA offer more functions than QGIS. However, in 10.1, you wrote that QGIS also includes algorithms / has access to GDAL/OGR, TauDEM, Orfeo Toolbox and Lastools. Also, it is not clear whether GRASS and SAGA offer access to other libraries (such as the ones I just mentioned).
- Table 9.1 lists the number of geoalgorithms offered by each GIS package. Unlike QGIS, SAGA and GRASS do not provide access to third-party geoalgorithms. However, QGIS does not offer access to all SAGA and GRASS GIS algorithms. We tried to make this clearer by rewriting as follows: "By all means, there are use cases when you certainly should use one of the other R-GIS bridges. Though QGIS is the only GIS providing a unified interface to several GIS software packages, it only provides access to a subset of the corresponding third-party geoalgorithms (for more information please refer to https://journal.r-project.org/archive/2017/RJ-2017-067/RJ-2017-067.pdf)".
[x] 4.14 [JM] Section 13.2 goes a little fast; I had to read it a couple of times to really understand the example data (also because the data is contained in multiple objects from different packages). Although all information is covered in the text, a little more context / introduction can be helpful.
- In accordance with your and another reviewer's comments, we have tried to provide a little more context.
[x] 4.15 [JM] 13.5.2 Small consistency note: why is the resampling block redefined (this time saved as perf_level) in the setup of SVM? (Whereas the task block was reused from the binomial regression).
- Yes, you are right, thanks for spotting. For consistency, we have now renamed resampling to perf_level in the GLM section as well and say in the SVM section that the resampling is identical to the one used in the GLM section.
[x] 4.16 [JM] I do not really understand what hyperparameters are. Also, which and how many hyperparameters are used in this example? Are those the 50 defined by ctrl, or the pair C and sigma? What is the overall process look like? Like this? performance estimation (estimating model parameters) -> tuning (estimating hyperparameters) -> performance estimation and so on.
- Hyperparameters are explained at the beginning of section 11.5.2. In contrast to coefficients of parametric models, hyperparameters are not estimated from the data but are simply defined. Since an arbitrary choices of hyperparameters will hardly result in the best model performance, one has to tune the hyperparameters, i.e. one randomly selects different hyperparameter combinations, runs the model and chooses the one with the best performance. Hence, the process is: splitting the dataset in five spatially disjoint test and training sets. For each fold, spatial hyperparameter tuning (i.e. nested cv), while the best result of the hyperparameter in turn is used for the performance estimation.- Additionally, we have rephrased as follows:
This means that we split each fold again into five spatially disjoint subfolds which are used to determine the optimal hyperparameters (tune_level object in the code chunk below; see Figure 11.6) for a visual representation). To find the optimal hyperparameter combination we here fit 50 models (ctrl object in the code chunk below) in each of these subfolds with randomly selected values for the hyperparameters C and Sigma. The random selection of values C and Sigma is additionally restricted to a predefined tuning space (ps object). The range of the tuning space was chosen with values recommended in the literature (Schratz et al., 2018). To make the performance estimation processing chain even clearer, let us write down the commands given to the computer:
1 Performance level (upper left part of Figure 11.6): split the dataset into five spatially disjoint (outer) subfolds.
2 Tuning level (lower left part of Figure 11.6): For each of these folds, run the hyperparameter tuning, i.e. spatially split the performance fold again into five (inner) subfolds.
Use the 50 randomly selected hyperparameters in each of these inner subfolds, i.e. fit 250 models.
3 Performance estimation: Use the best hyperparameter combination from the previous step (tuning level) in the performance level to estimate the performance (AUROC).
4 Do all of the steps described above for the remaining four outer folds.
5 Repeat a 100 times all the steps from above.
Chapter 12:
[x] 4.17 [RL] In my opinion, chapter 12 is certainly interesting for the target readers. Some points of directions could be: - How to bring it all together in reproducible R-scripts? I.e. reading, processing and visualizing spatial data. - Best practices for a reproducible production process. This has certainly overlap with other, more general, books on programming with R, but there are certainly specific topics that could be interesting to cover, for instance: how to deal with constantly changing geospatial data sources, such as OpenStreetMap?
We have added this content (except the OSM suggestion, that's hard!)
Additional comments
Chapter 12:
[x] 5.1 [RL] It took me 2.5 hours to install a long hierarchy of dependencies for the five pre-requisite libraries. Especially, stplanr has become dependent on so many other packages that readers are bound to run into problems here.
[x] 5.2 [RL] Figure 12.1 is a very low-resolution screenshot and needs to be clarified.
I very much like the end of section 12.2. It is a bit unusual for a textbook but this is exactly the kind of outline that I would like to see my students writing in their GIS project reports. A very nice example.
*We've increased the resolution of the figure**
[x] 5.3 [RL] I am not sure how much the publisher care about grammar but the authors of this chapter are a little frugal with their commas.
**After feedback we are now a little more 'comma happy'
[x] 5.4 [RL] I don't know why, but the code snippet on p 262 starting with "od_top5" does not copy and paste properly. Please check the source code for this page for irregularities.
[x] 5.9 [RL] At the very end of the chapter, add a remark to the bonus question for exercise 5 that indicates this being the approach that underlies the Geomarketing chapter.
[x] 5.10 [JM] Similar to the copy_and_paste problem on page 262 in the previous chapter, all the numbers did not copy from the dplyr code snippet on page 277.
- This is an unfortunate pdf issue. Copying the code from the online book version works.
[x] 5.11 [JM] After the code snippet, the manuscript text starts to get garbled. I would clean the paragraph starting with "Excerpts" and put it into a footnote. The rest of this section can then just be deleted.
- This is a Latex/HTML issue. In fact, the garbled text is a table (pls have a look at the online version of the book. We will take care of this when we going to publish the book.
[x] 5.12 [JM] The same not-copying-numbers occurs for the rasterfromXYZ function on page 279 - as well as all subsequent code snippets in this chapter.
- Please refer to the answer of 5.10.
[x] 5.13 [JM] The reclassification on page 280 is awkward and would be easier to follow if the author inserted a table that shows the classes for each variable.
- Please refer to the answer of 5.11.
[x] 5.14 [JM] The reference to a for-loop in the first paragraph on page 281 is misleading. The author is referring to the processing of a matrix, which behaves like a for-loop but as this is a textbook, the nomenclature should avoid such confusions.
- The description still referred to mapply() which we replaced by for in accordance with a reviewer’s comment. Now the corresponding text is also adjusted accordingly, thanks for noting!
[x] 5.15 [JM] The whole reverse geocoding still does not convince me. Why don't you use the revgeo or the opencage package? And, btw, your code requires ggmap, which was not loaded in the pre-requisites.
- Usually, Google gives back the best results and can handle different types of writing an address (ave, ave., avenue, street, str, st, etc.). ggmap is now mentioned in the prerequisites, thanks for spotting.
[x] 5.16 [JM] The osmdata download is indeed amazing. But then there is an error in the rbind call towards the bottom of page 286. In the code snippet, the order of the parameters is reversed, it should read do.call(rbind, shops).
- Changed and thanks for spotting!
Chapter 5:
[x] 5.17 [JN] Section 5.2.4 perhaps needs a little more details on what is going on in the last two examples (middle and right panels in Figure 5.5). For example, is is not explained what actually happens when two layers are subtracted from one another, as in "nz_sfc–nz_centroid_sfc". Also, perhaps a practical note can be given with a more specific guideline as to why one may need to shift or rescale geometries in the first place. For instance, two examples where shifting is needed is for labels placement and for shadow casting algorithms. The authors mention correction of wrong reprojection, but this is a very rare use case at least from my experience. In the same context maybe another example can be added to show how can one shift a geometry given azimuth+distance, rather than x_offset+y_offset. Also, the idea behind the rotation matrix, and how does the "*" operator works with a geometry plus a rotation matrix, needs some more details. These are advanced operations for the average GIS practitioner, unlike clipping, buffering etc. that are very familiar and do not require elaboration.
- Thank you. We've improved introduction to this section and added an explanation of "nz_sfc–nz_centroid_sfc". We also agree that this is an advance operation, however, explaining it in details is outside the scope of the book. Can you recommend any resources that could be referred here?
[x] 5.17 [JN] Section 5.2.7 will benefit from a (graphical) summary on the different pathways one can use to cast between the various geometry types. Perhaps there can be a figure with specific geometries – like Figure 5.10 but covering all possible cases and not just the two being shown (multipoint → line/polygon). Such a figure will also show the reader which casting pathways are not supported in 'sf', which is also something important to be aware of.
- Thanks - that is a good idea. Type transformation is a complex issue - it strongly depends on an input data class (sfg, sfc, sf), data type (e.g. point, linestring), objects number and their arrangement. Therefore, we decided to add a table showing how st_cast works only on the most popular data class - sf.
[x] 5.18 [RL, JN, JM] If there is room, the authors can go further and end Chapter 5 with a hint on how the raw coordinates can be manipulated when necessary, giving the user full control of rearranging vector geometry the way he/she wants. One example can be taking a line layer and reversing the order of coordinates for all lines, or reversing the order of just some of the lines in a way that all lines eventually "flow" in a certain direction (e.g. north to south), etc. Perhaps this can be an introduction to Chapter 10 where a more complex algorithm is demonstrated.
We have decided not to include this (admittedly sensible) suggestion for time and space constraints - there are few common use cases.
[x] 5.19 [JN] Section 6.6 on raster reprojection should perhaps emphasize that raster reprojection can be thought of as a vector-reprojection of cell centroids to the other CRS, then going back to a regular grid in the new CRS through resampling. In other words, raster reprojection (Section 6.6) and resampling (Section 5.3.3) are closely related operations, and mentioning this link can help users understand the material. Also, a figure can be added where the same small raster (e.g. 5*5 cells) is shown in the original CRS and after reprojection to a different CRS, with grid-lines of both systems in the background of each panel. Such a figure can emphasize the fact that the raster extent is rectangular and parallel to the specific current CRS in each case.
- Thank you for this great suggestions. We added a sentence to the raster reprojection section - https://github.com/Robinlovelace/geocompr/commit/e80acae335321c5daf33badc6c39751627b1716f
Misc:
[x] 5.20 [RL, JN, JM] (...) That said, the current division is perfectly fine in my view; perhaps its rationale can be given more emphasis in the text as follows -
• Chapter 4 deals with operations where the spatial location is being used, but is not being modified with any spatial algorithms as part of the operations – just reordered, filtered, merged, rearranged, etc. For vector layers this includes spatial subsetting, checking spatial relations, spatial joins, aggregation and calculating distances. One thing to consider – aggregation does modify the geometry, though only by dissolving borders or combining single-geometries to multi-geometries, so perhaps it can be moved to Chapter 5 and combined with the geometry unions into Section 5.2.6. For raster data, this chapter includes subsetting, masking, map algebra, local, focal and global operators and merging.
• Chapter 5 deals with operations where the spatial location is being processed in complex ways that are beyond simple rearrangement. With vector layers this includes simplification, calculating centroids, calculating buffers, shifting, clipping and unions, and casting to other types. With raster data this includes intersection, changing the extent and origin, aggregation and disaggregation. Second thing to consider – the "intersection" part is perhaps more fitting to Chapter 4, along with Section 4.3.1, as it is another example of subsetting, without modifying the spatial arrangement of pixels, analogous to Section 4.2.1 for vector layers. The raster-vector interaction section (5.4) is a good fit here because all of the presented operators transform the geometry of the raster or vector inputs.
• Chapter 6 deals with reprojection, which can be described as a specific type of geometric operation where the data are transferred "as is" into a different CRS.
General comments -
We have made the structure of the book clearer and updated chapter names partly in response to these suggestions. Any further comments welcome
Now #178 is closed we can work on this - suggest we only fix the unambiguous and smaller issues and wait for further reviewers input before working on issues that need to be discussed and decided first.
reviewer one
Chapter 9:
graphics
andgrid
packages. Ggplot2 has entire books dedicated to it. Both are referenced in the vis chapter. We think deliberately adding new content and documenting tmap beyond (the admitedly very good) documentation of the package itself is a further strong reason, beyond those outlined above. The approach is to link to existing resources and guide people to good packages as much as possible while minimizing duplication of material covered in existing books.nz
is at this stage. This and some associated issues are addressed in https://github.com/Robinlovelace/geocompr/commit/888c7d9d6a1b431355ee29042b3e260790cd88cd making the section easier to understand for people who start on this chapter.Chapter 10:
Chapter 11:
fun
arguments is used. Secondly, it is not distinctly quicker - the extraction process is the most time consuming one. The only actual advantage (that I can see) of usingfun
is it requires less code to write.Chapter 13:
[x] 1.21 [JM] Section 13.2 – To make the landslide dataset "balanced", in the given example the non- landslide points are sampled to give similar positive and negative sample sizes (175 vs. 175). Intuitively this seems wrong: the sample should either be completely random (e.g. sampling random points inside the study area, then checking them for landslides) or comprising the entire population (completely mapped study area). Why not divide the study area into a grid (i.e. raster), where samples are grid cells and have the response variable specify whether the cell contains (at least one) landslide, assuming complete mapping? In other words – it is unclear to me what is the rationale behind keeping "balance" between sample size of the two categories, at the expense of randomness. Few years ago I did a study on forest mortality and made the above-mentioned decision. The dataset was similar in structure: there are tree mortality locations scattered around the forest. Similarly to the landslides example, there were quite few mortality sites and a lot of non- mortality sites. However, I thought that instead of making the data balanced, the entire range of conditions in the forest needs to be characterized. So I ended up dividing the forest into a regular grid, and having each grid cell sampled as TRUE/FALSE, i.e. whether the grid cell had at least one mortality location inside. The data indeed were unbalanced, with just ~3% of cells being TRUE. (The data then went to a Mixed-Effects GLM considering spatial autocorrelation). Similarly, in presence/absence modelling of species distributions: I would think that entire valid dispersion range (e.g. a continent) needs to be considered as the "sample", with all cells given TRUE/FALSE values according to presence/absence of the species in that cell, rather than choose N observed locations and N absent locations. I believe that spatial logistic regression models (e.g. package 'spatstat') use a similar rationale: the spatial window is considered completely mapped, then the point pattern of observed events is modeled as function of background (e.g. raster) predictor also available for the entire area. Admittedly, I have no theoretical justification, it just intuitively seems more correct. If I'm missing something and the authors have any comments on this I will be very happy to learn. -Thank you for sharing your experience and the feedback! And I see, why our approach might be a bit confusing. Let me give you a bit more detail on the background. In the original study (https://doi.org/10.1016/j.geomorph.2011.10.029), we mapped all landslides in the study area. Secondly, we used the mid-point of the landslide scar as the response variable. Thirdly, we randomly sampled points which were located outside of the landslide polygons. We selected as many non-landlides points as there were landslide points since logistic regression using a log-link roughly expects a balanced response (see e.g., http://highstat.com/index.php/beginner-s-guide-to-glm-and-glmm). However, the same dataset has been also used in classes and summer schools to show how different modeling algorithms can handle the uneven distribution of the response variable and the related increase in spatial autocorrelation. This is why 1360 non-landslide points where randomly selected but again this was done by randomly selecting these outside of landslide polygons. To clear this up, we have added this explanation to a footnote. To answer your question on dataset completeness. We do not need the entire population but just a representative sample of it for statistical modeling. In fact, using the whole population would make modeling redundant because you already have the information for each pixel. But you are right that some spatial point pattern algorithms expect a full inventory, e.g. all trees (see the spatial point pattern analysis chapter of the ASDAR book.
[x] 1.22 [JM] Section 13.5.1 – In the first code section, the expression 'coords = lsl[, c("x", "y")]' is duplicated. -Changed! Thanks for spotting!
[x] 1.23 [JM] Could be helpful to comment on model interpretation in GLM vs. SVM. For example, how can we interpret the coefficients in GLM? How can we calculate variable importance in machine learning methods, such as SVM? -As emphasized in the introduction, this chapter is on spatial predictions using machine learning. Still, you are right that one can compute variable importance of ml algorithms but we better refrain from it here since otherwise we would have to add another section on the difference of statistical inference and variable importance in machine learning.
Additional questions:
[x] 1.24 [RL] The idea of an "Algorithms" chapter is great. From the chapter draft it seems the example (centroid) is relatively complex though. Also seems like part of the solution will be using an external package 'decido'. I would suggest replacing or supplementing this example with a simpler one, where the solution only requires functions covered elsewhere in the book, such as functions from the 'sf' package only. Just one of many possible suggestions – a function that accepts a polygonal layer and returns a new line layer containing separate line segments (as separate features) along with the azimuth where the exterior of each line segment is facing. This can be useful in urban planning, e.g. to understand how building facades are oriented. Such a function needs several steps: first splitting the polygons to line segments, then extracting each segment coordinates (x1, y1, x2, y2), and finally applying some trigonometry and math to find the azimuth. This does not require anything outside of 'sf' plus base R. Here is an illustration of a sample layer and the resulting azimuth classification - [FIGURE IN THE PDF]
- The chapter was at an early stage so we're very grateful for the high level comments and the specific suggestions. Based partly on these suggestions, the chapter has evolved substantially. We are confident it is much better and addresses each of the issues raised. See https://geocompr.robinlovelace.net/algorithms.html . Regarding the centroid example, we agree the description was a little long-winded. Based on this comment we've re-written the algorithms section. It no longer depends on decido (or purrr for that matter). It's shorter. And hopefully provides more boom for your buck! - Regarding the decision to use centroid, we defend that decision. It's something everyone can related to. It's now visually very clear what's going on with the new 3 facet plot - see https://geocompr.robinlovelace.net/algorithms.html#fig:polycent - that should be satisfying. More importantly it provides a reason to link to source code of industrial-grade implementations such as GEOS. That was always an intention: to make the reader aware that hard work has gone in to mean they don't have to write all their algs from scratch and that now comes through in the prose also hopefully. Any further feedback on this very much appreciated - we have a version that works even for non-convex polygons for example - worth mentioning? - We agree with your suggestion for a function to 'explode' linestrings representing building walls. We'll work on it.
reviewer two
Chapter 12:
Chapter 9:
[x] 2.2 [RL] As tmap is based on ggplot, I deem it necessary to introduce the underlying philosophy of ggplot first. - We now mention the Grammar of Graphics on which both tmap and ggplot are based - see https://github.com/Robinlovelace/geocompr/commit/7ec9456191e8f5b2701eaac18c4ed81683ed05f6 . I am not an expert in this field of visualization so any follow-on comments or suggestions welcome here (RL)
[x] 2.3 [JN] 9.2.2: There is no dataset nz_elev - The
nz_elev
dataset is in the development version of the spDataLarge package.[x] 2.4 [JN] 9.2.3: There is no variable Land_area. The plot command therefore does not work. Replace with existing variable name LAND_AREA_SQ_KM. - This variable exists in the development version of the spData package.
[x] 2.5 [JN] There is no variable Population nor Median_income. As, in this case, there are no such variables in the data set, none of the code in 9.2.4 works. - This variable exists in the development version of the spData package.
[x] 2.6 [JN, RL] Section 9.2.5 has to be re-written to accommodate the way that tm_style works now. None of the +tm_style commands work. This includes the concluding pointer to the style_catalog function - We present how the tm_style function works starting from tmap version 2.0. It should be on CRAN soon.
[x] 2.7 [JN] In section 9.2.7, the call to the grid() library seems innocuous. However, the subsequent call of the viewport function depends on it. This is hard to understand for the uninitiated. The last paragraph in section 9.2.7 is meant to be a transition to section 9.3 but what is confusing is that the paragraph actually refers to all of 9.2 rather than just the last three pages. - We state that a viewport from the grid package will be used in the text. What else do you recommend? The last sentence of section 9.2.7 is meant to show readers that the examples in this section could be improved (e.g. with a better style) and the method presented can be applied not only to combine maps, but also to combine maps and plots.
[x] 2.8 [JN] Section 9.3 relies on an older version of tmap. The animation function is now called animation_tmap and assumes the installation of a software package external to R (ImageMagick). That is already problematic. With the change in the function call, I also get a "convert -version" error. - Actually, this section relied on the newer version of tmap. Some of the function names were standardized, including the change from
animation_tmap
totmap_animation
.[x] 2.9 [RL] The world coffee code does not work either: the layer definition for facets requires a true/false value. Mapview, leaflet, and shinyApp, however, work just fine.
Chapter 10:
data("cycle_hire", package = "spData"); points = cycle_hire[1:25, ]
.Chapter 11:
Chapter 13:
reviewer three
Chapter 12:
Chapter 9:
[x] 3.2 [RL] This is a very good overview, it’s not only very practical in showing how to do things with these packages, it also gets across what a diverse and dynamic space the R spatial landscape is. I’m generally just reading it and learning, which is terrific! It’s a long chapter but for good reason, lots of strong ideas and good useable examples. I find this does lack a little bit of the big picture though, why is the space so fragmented, why are there so many mapping packages, and sooo many animal tracking packages? Is it because R is very flexible and dynamic and a lot of great ideas are happening fast, is it because it’s just how it is in geospatial? Are there prospects for tighter unity or should we expect more and more packages? How does the R community compare to the QGIS community for mapping? Why is QGIS and ArcGIS more about Python than R? A few thoughts on some of these questions might be useful.
[x] 3.3 [JN] Section 9.1 typo “are often be the best way”
[x] 3.4 [JN] Section 9.2 typo “and the grid providing” " is making static with tmap."
[x] 3.5 [JN] Section 9.2.3 typo “The purpose this section is to show how”
[x] 3.6 [JN] Section 9.2.3 typo " for to create superscript text)"
[x] 3.7 [JN] Section 9.2.4 typo “functions it in facts originates” - Thank you - https://github.com/Robinlovelace/geocompr/commit/5c4d03bf87bce93fec73b78975389906246001cf.
[x] 3.8 [RL] Section 9.2.6 Is “small multiples” defined in the lit somewhere? A ref? I see the term but it’s possibly an ESRI thing? I’ve tended to use “conditioned on” (a grouping value) in the past, but can’t remember where I picked that up - faceted is a very good term.
[x] 3.9 [RL, JN] Section 9.3, 9.4 This is excellent, very good description of the need for various modes of faceting and animation. Might mention the lack of an overarching framework for continuously varying data in R and animations, and while magick, animation, moveVis, and gganimate and others provide frame-based animation there is a longer-term effort to make continuous transitions more generally available (tweenr, new gganimate).
animation_tmap()
.[x] 3.10 [RL] Section 9.5 For absolutely up-to-date-ness it might be worth mentioning the new “async” facilities for shiny, and how that will help make shiny more acccessible and useable because of better responsiveness and scaleability.
Chapter 10:
Chapter 11:
Chapter 13:
reviewer four
General remark:
[x] 4.1 [RL, JN, JM] Minor general remarks: I notice that you use the "=" assignment throughout the book. You may be aware that within the R community the general consensus is to use the "<-" operator. Personally, I do not care, but for the vast majority of R users who are used to "<-", it might be confusing.
We all use
=
in our own work and have decided to use it for ease of teaching. We have discussed changing it and would be open to doing so if a strong case can be made. It is certainly worth mentioning, however.Chapter 9:
[x] 4.2 [RL] Section 9.2 "... and the grid providing functions for low-level control of graphical outputs" Do you mean the "grid package"? If so, you may note that base-R graphics differs from grid-based graphics. Base-R graphics can be used for charts and maps (i.e. the plot function), but the grid graphics only offer graphical building blocks (like grid.rect and grid.text), which are not very useful for the end-user.
plot()
offering dozens of arguments. Another low-level approach is the grid* package, which provides functions for low-level control of graphical outputs, --- see R Graphics* [@murrell_r_2016], especially Chapter 14."[x] 4.3 [RL] Section 9.2 Note that the vignettes will be reorganized in version 2.0 (and also renamed), so these links may not work after releasing tmap 2.0. You can also refer to "the vignettes listed in https://cran.r-project.org/web/packages/tmap"
[x] 4.4 [RL] (
? tmap-element
--> ?'tmap-element' (Also watch the difference between ` and ', otherwise it won't work when copy-pasting)help("tmap-element")
for a full list)."[x] 4.5 [RL] Small detail on "adding a border on top of the fill layer.". Normally, the order of layers represents the plotting order, but polygons are drawn only once. The different layers tm_fill and tm_borders are just to let users be able to specify borders and fill separately. In other words tm_borders() + tm_fill() gives the same result.
tm_borders()
call aftertm_fill() +
in the previous code chunk." - agreed but that is too subtle to be worth mentioning - the order of layers is the main point.[x] 4.6 [JN] 9.2.4 Last paragraph. Small note: in tmap 2.0, the viridis palettes can be specified directly in tmap, e.g. palette = "magma" - Thanks - https://github.com/Robinlovelace/geocompr/commit/512d1769914db0acdcee31757958d5f73824c8a3.
[x] 4.7 [JN, RL] Figure 9.13. In order to prevent occlusion, you could increase the figure a little, and try to add white borders. For that, you'll need tm_symbols rather than tm_dots, since it has a different shape (symbol 21 instead of 16). data(World) data(metro) qtm(World, projection = "longlat") + tm_shape(metro) + tm_symbols(col = "black", size = "pop2020", border.col = "white", scale = 3) - Thank you for this suggestion. We have implemented it.
[x] 4.8 [RL] As of version 2.0, using tm_view(basemaps = basemap) will be deprecated and replaced by tm_basemap(server = basemap). Users can specify the default basemaps with tmap_options.
[x] 4.9 [RL] Third paragraph: shiny offers a solution to the first and third limitation, but not the second one (scalability of large datasets).
[x] 4.10 [RL] Inset paragraph on shiny: note that copy-pasting runApp(“coffeeApp”) will not work, since the double quotes are different than ".
block2
.Chapter 10:
find_algorithms()
for the first time, we also write that when using it without specifying anything, it will return all available QGIS geoalgorithms including a short description. Still, we have added a link to the online documentation of all QGIS geoalgorithms since this provides a better arranged overview as you correctly point out. Thanks!Chapter 11:
Chapter 13:
resampling
toperf_level
in the GLM section as well and say in the SVM section that the resampling is identical to the one used in the GLM section.tune_level
object in the code chunk below; see Figure 11.6) for a visual representation). To find the optimal hyperparameter combination we here fit 50 models (ctrl
object in the code chunk below) in each of these subfolds with randomly selected values for the hyperparameters C and Sigma. The random selection of values C and Sigma is additionally restricted to a predefined tuning space (ps
object). The range of the tuning space was chosen with values recommended in the literature (Schratz et al., 2018). To make the performance estimation processing chain even clearer, let us write down the commands given to the computer: 1 Performance level (upper left part of Figure 11.6): split the dataset into five spatially disjoint (outer) subfolds. 2 Tuning level (lower left part of Figure 11.6): For each of these folds, run the hyperparameter tuning, i.e. spatially split the performance fold again into five (inner) subfolds. Use the 50 randomly selected hyperparameters in each of these inner subfolds, i.e. fit 250 models. 3 Performance estimation: Use the best hyperparameter combination from the previous step (tuning level) in the performance level to estimate the performance (AUROC). 4 Do all of the steps described above for the remaining four outer folds. 5 Repeat a 100 times all the steps from above.Chapter 12:
Additional comments
Chapter 12:
mapview::mapview(desire_carshort$geom_car)
Chapter 13:
mapply()
which we replaced byfor
in accordance with a reviewer’s comment. Now the corresponding text is also adjusted accordingly, thanks for noting!Chapter 5:
Misc:
[x] 5.20 [RL, JN, JM] (...) That said, the current division is perfectly fine in my view; perhaps its rationale can be given more emphasis in the text as follows - • Chapter 4 deals with operations where the spatial location is being used, but is not being modified with any spatial algorithms as part of the operations – just reordered, filtered, merged, rearranged, etc. For vector layers this includes spatial subsetting, checking spatial relations, spatial joins, aggregation and calculating distances. One thing to consider – aggregation does modify the geometry, though only by dissolving borders or combining single-geometries to multi-geometries, so perhaps it can be moved to Chapter 5 and combined with the geometry unions into Section 5.2.6. For raster data, this chapter includes subsetting, masking, map algebra, local, focal and global operators and merging. • Chapter 5 deals with operations where the spatial location is being processed in complex ways that are beyond simple rearrangement. With vector layers this includes simplification, calculating centroids, calculating buffers, shifting, clipping and unions, and casting to other types. With raster data this includes intersection, changing the extent and origin, aggregation and disaggregation. Second thing to consider – the "intersection" part is perhaps more fitting to Chapter 4, along with Section 4.3.1, as it is another example of subsetting, without modifying the spatial arrangement of pixels, analogous to Section 4.2.1 for vector layers. The raster-vector interaction section (5.4) is a good fit here because all of the presented operators transform the geometry of the raster or vector inputs. • Chapter 6 deals with reprojection, which can be described as a specific type of geometric operation where the data are transferred "as is" into a different CRS. General comments -
We have made the structure of the book clearer and updated chapter names partly in response to these suggestions. Any further comments welcome