Review 4 - Githubissues

Review 1, chs 10,14-15

Chapter 10

[x] 1.1 The first two sections are very well written. Very succinct and way too brief to stand alone but it is only one of 15 chapters in a book that only marginally deals with scripting - even if everybody in the community assumes that scripting is an essential part of geocomputation. As such, it could be seen as carrying coals to Newcastle, but then, the monograph is trying to reach out beyond the traditional audience of geocomputationalists. Is why I consider this chapter a gem: the language is easy enough to not scare anyone. As every teacher in our field knows, this is an accomplishment. I am, however, missing the code folder on the book’s GitHub site. In section 10.3, I would drop the 3rd paragraph. It is irrelevant here and smacks too much of Wikipedia.
- We see have updated the 3rd paragraph of 10.3 so that it does more now, including referencing a book (and code) introducing geo algorithms from a C programming, no pseudo-code perspective:

Geoalgorithms, such as those we encountered in Chapter 9, are algorithms that take geographic data in and, generally, return geographic results (alternative terms for the same thing include GIS algorithms and geometric algorithms). That may sound simple but it is a deep subject with an entire academic field, Computational Geometry, dedicated to their study (Berg et al. 2008) and numerous books on the subject. O’Rourke (1998), for example, introduces the subject with a range of progressively harder geometric algorithms using reproducible and freely available C code.

I think that addresses the issue here but if you have any other suggestions on how to improve this section please let us know with reference to the latest version, which can be found here: https://geocompr.robinlovelace.net/algorithms.html#geometric-algorithms

[x] Also, watch out for the spelling of pseudocode.
- All instances of pseudocode are now spelled correctly (pseudocode)
[x] The first word of the last sentence in the 4th paragraph needs to be capitalized (after converting the preceding comma into a full stop). The first sentence after the first code snippet of 10.3 is missing a “that” as in “Now that we have”. Also, it is NOT really describing the first step of the algorithm, which is to divide an irregular polygon into the smallest possible collection of completely covering but non- overlapping triangles. That is implemented at the beginning of code snippet #4 but the code violates the rules laid out in section 10.2, i.e., there is no comment as to what is happening here. Sections 10.2 and 10.3 must have been written by different authors because this contradiction is really striking. Is then puzzling to see that the last two paragraphs of section 10.3 are beautiful again.
- We have reread the sections 10.2 and 10.3 and made various improvements to the text to try to increase the clarity of the prose. We have also commented the code chunks more: we are grateful for this suggestion as it improves readability greatly, especially for people new to R. Any follow-up comments welcome.
[x] Section 10.4 picks up the steam that this chapter gained at the end of 10.3. There is a minor syntactical error in the second sentence of the 7th paragraph (excluding code samples). Instead of Providing it should read “Provided” and I would insert a “that” after “Provided”.
- Fixed in https://github.com/Robinlovelace/geocompr/commit/4e999c23c8fc97156ee48762cd6c816098cadb47
[x] The exercise in section 10.6 then suggests that the lack of comments in the code sections of 10.3 - 10.5 is didactical; i.e., illustrating the fact that without comments, code is hard to read. While I appreciate the logic, I don’t think that this is an appropriate approach in a text that is just introducing algorithms. Please do insert the comments.

Chapter 14

[x] Fog oases also develop along the coasts of Yemen and Oman (see DOI: 10.1007/s10113-016- 0942-2). - Thanks for pointing this out. You are right that the text somewhat implied these vegetation formations only existed which of course is not true (fixed in commit 3a4e043).
[x] Otherwise, chapter 14 is well written with only minor Ginglish (German English) issues such as ‘gives back’ instead of ‘returns’. - Thanks! I replaced give back by return.
[x] Somewhat similar to the Scripting chapter, the writing is a little terse. I appreciated the observation that some of the input data used here is skewed and should hence be rectified but this really is written like a throw-away observation. A little more verbosity would make this chapter more useful to a larger audience. There were lots of cross-references to Chapter 11, which usually is a good thing. But a little repetition would be even better. Don’t mind the redundancy! - In accordance with your suggestion and the comments' of the two other reviewers, we have extended the explanation of ordinations (NMDS), random forests and mlr building blocks. If you still think the writing too terse, we would appreciate if you could point us to the specific parts of the text which require more explanation.
[x] Towards the end of the chapter, the author introduces a variable ‘ep’ in the code sample that comes out of nowhere. Please clean this up. - ep was created using RQGIS (see section https://geocompr.robinlovelace.net/eco.html#data-and-data-preparation). ep is now available via the spDataLarge package which spares the reader to actually use RQGIS in case he/she does not want to install third-party GIS software.

Chapter 15

[x] I like sections one and two but the last one is nothing but a fig leaf, This really isn’t about social benefits and the perfume example is doing the authors rather a disservice. Please go back to the drawing board or just plain delete this half page.

Review 2, chs 10,14-15

Chapter 10

[x] Page 233, 1 st paragraph – Not sure why the text says “and have already imported the datasets needed for your work”, as the code presented in this chapter does not rely on any datasets and only requires loading the sf package.

The text was a response to a previous review comment that said having your data loaded is a pre-requisite. However we agree it sounded strange. Updated now to:

It assumes you have an understanding of the geographic classes introduced in Chapter 2 and how they can be used to represent a wide range of input file formats (see Chapter 7).

[x] Page 236, Figure 10.1 – Figure caption refers to line 11 while the screenshot shows the error is on line 4. Also, the error on the screenshot is a duplicated pipe operator %>%, not an unclosed curly bracket.
[x] Page 240, Figure 10.3 – Though the preceding code section is perhaps too complex to be covered in detail given space constraints, some details can be highlighted nevertheless. For example, in what way is the polygon split into triangles? One of the points on the polygon outline is chosen as the “anchor” point (bottom left point in Figure 10.3), then triangles are constructed by joining it with consecutive point pairs taken from the remaining outline points, e.g. 2 and 3, 3 and 4, etc. This is easy to explain if the points on Figure 10.3 will be numbered. The authors may also consider expanding Figure 10.3 with a few more examples of different convex polygons and their corresponding division into triangles.
- These are reasonable suggestions but I think they risk further complicating the code. If we number the points, for example, we will either have to add more code that could cause confusion, or hide the point numbering code, leading confusion. So while I 100% agree with the idea, I cannot think of a way to implement it without adding more complexity to what is already, for beginners, already confident enough (RL - I'm very open to discuss this and am happy to defer to the views of others on this if a strong case is made).
[x] Page 241, Section 10.4, 1 st paragraph – I’d rephrase and say that an algorithm is the general recipe for performing a given computational task, while a function is a form of implementation of an algorithm in a particular programming language. A script can be seen as less general, simply a collection of computer instructions, not necessarily having defined input(s) and output like an algorithm or a function.
- Sensible suggestion. Implemented here: https://github.com/Robinlovelace/geocompr/commit/f1e4e55da2f43bfb5f19ea9d4c571d6d1090be55
[x] Page 243, 3 rd paragraph – A few words of discussion on practical implications and future directions can be helpful here: what kind of considerations need to be taken into account if we want to adapt the presented algorithm to a wider range of situations? For example, footnote #9 says the algorithm only works for convex polygons; which alternative algorithm is commonly used for concave polygons, polygons with holes, or multipolygons? Finally, is the presented algorithm actually used in “real-world” software such as GEOS? If not, which alternative algorithms are used, and what is their advantage over the presented one?
- See https://github.com/Robinlovelace/geocompr/commit/8682480b0bb2b3fabcf98202b670aa72f6d9556d

Chapter 14

[x] Page 311, 2 nd paragraph from top – “... from a (noisy) dataset”: a few words on the nature of the data can be helpful here for readers without ecological background. For example, the term “community composition” should appear, and perhaps a Figure illustrating the nature of the data: amatrix where rows represent sites, columns represent species, and values represent abundance of given species in given site. - We absolutely agree. And in fact we are doing exactly what you propose in the next section. To make this clearer, we have added a cross-reference.
[x] Page 311, Section 14.2, 1 st paragraph – The nature of the comm matrix values is not clear. What are “percentage points” – are these percentages, such as 0-100? If so, how come the row sums are over 100 in some cases?

comm %>% rowSums %>% range
[1] 0.006 132.700

Perhaps this is due to overlapping cover between individual plants, in which case this should be mentioned to avoid confusion. This should probably also be clarified in ?comm.

- Yes, overlapping covers are in fact the reason for values >100%. We have added this information to the text and also to ?comm. Thanks for noting!

[x] Page 311, Section 14.2, 1 st paragraph – The fact that 16 sites where no species were found are omitted from the comm matrix is unfortunate: generally absence data should not be omitted from the input datasets, but filtered later on where necessary by the analyst, to avoid elementary mistakes. For example, if one runs the following expression to get the average plant cover in the studied area he/she gets an overestimate since the “empty” sites are ignored.

comm %>% rowMeans %>% mean
[1] 0.3809451

I realize that some types of analyses cannot accept “empty” sites; however it is easy to remove those sites prior to analysis, with the added benefit that the analyst is reminded of the reality that such sites actually exist and making a conscious decision whether to include them.

- Yes, I absolutely agree, I generally would not dismiss "empty" sites. Unfortunately, ordination techniques such as NMDS and DCA cannot handle empty sites. But I agree again with you that we should make this an obvious decision of the data analyst. Hence, I have added the empty sites to the community matrix, and we dismiss them now explicitly in the text.

[x] Page 313, 1 st paragraph – The authors may consider providing links to pre-calculated input rasters (such as the wetness index raster) rather than having code that relies on external software bridges (QGIS). In my opinion GIS-bridges should not be used outside of Chapter 9, where they are introduced, because they are not essential to the material of other chapters yet add unnecessary burden on the reader who is not interested in installing and configuring external software but still wants to reproduce the examples. For example, I recently upgraded QGIS to version 3.18, which I could not configure to run with RQGIS getting the following error –

open_app()
Error in py_run_string_impl(code, local, convert) :
ImportError: No module named qgis.core

As far as I can tell this is due to the fact that the RQGIS package cannot work with this version of QGIS yet – https://github.com/jannes-m/RQGIS/issues/97

- Yes, the reason is that RQGIS is so far not working with QGIS3 but it will, just give me a bit more time (in fact I have already made RQGIS working with QGIS3 but had not have the time yet to implement it properly). You are also right that we should provide the reader with the output of the RQGIS processing. Hence, we have added ep to spDataLarge as a convenience to the reader.

[x] Page 315, 2 nd paragraph – Why do ordinations using presence-absence data yield better results? And what is the meaning of "better results" in this context?

- "Better" refers to a larger part of the variance explained by 2 axes. That presence-absence data yields better results compared to percentage data is explored in more detail in the Exercise section of the chapter.

[x] Page 316, 1 st paragraph – What does it mean that "we rotate the NMDS"? A figure with a pair of NMDS plots, the original and rotated one, can help illustrate the concept. Maybe even combine both plots into a single plot where each pair of points for the same site (the original and rotated one) are joined with an arrow. Perhaps something like the first figure on page 8 of the vegan package tutorial (http://cc.oulu.fi/~jarioksa/opetus/metodi/vegantutor.pdf). - In fact, I liked your idea of showing the difference between the unrotated and the rotated scores. The resulting figure, however, is hardly convincing (see below). Therefore, we have only added a reference to MDSrotate() to make clearer that the first axis is rotated in accordance with elevation.

library(lattice)
library(vegan)
#> Loading required package: permute
#> This is vegan 2.5-2
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(sf)
#> Linking to GEOS 3.6.2, GDAL 2.3.1, proj.4 5.1.0

data("random_points", "comm", package = "RQGIS")
data("ep", package = "spDataLarge")
random_points[, names(ep)] = raster::extract(ep, as(random_points, "Spatial"))

# presence-absence matrix
pa = decostand(comm, "pa")  # 100 rows (sites), 69 columns (species)
# keep only sites in which at least one species was found
pa = pa[rowSums(pa) != 0, ]  # 84 rows, 69 columns

my_url = "https://raw.githubusercontent.com/Robinlovelace/geocompr/master/extdata/14-nmds.rds"
# my_file = tempfile(fileext = ".rds")
# download.file(my_url, my_file, method = "curl")
# readRDS(my_file)
nmds = readRDS(gzcon(url(my_url)))
elev = dplyr::filter(random_points, id %in% rownames(pa)) %>% 
  dplyr::pull(dem)
# rotating NMDS in accordance with altitude (proxy for humidity)
rotnmds = MDSrotate(nmds, elev)
# extracting the first two axes
sc = scores(rotnmds, choices = 1:2)

xyplot(scores(rotnmds)[, 2] ~ scores(rotnmds)[, 1], pch = 16, 
       col = "lightblue", xlim = c(-3, 2), ylim = c(-2, 2),
       xlab = list("Dimension 1", cex = 0.8), 
       ylab = list("Dimension 2", cex = 0.8),
       scales = list(x = list(relation = "same", cex = 0.8),  
                     y = list(relation = "same", cex = 0.8),
                     # ticks on top are suppressed
                     tck = c(1, 0),  
                     # plots axes labels only in row and column 1 and 4
                     alternating = c(1, 0, 0, 1),    
                     draw = TRUE),
       # we have to use the same colors in the legend as used for the plot
       # points
       par.settings = simpleTheme(col = c("lightblue", "salmon"), 
                                  pch = 16, cex = 0.9), 
       # also the legend point size should be somewhat smaller
       auto.key = list(x = 0.7, y = 0.9, text = c("unrotated", "rotated"), 
                       between = 0.5, cex = 0.9),
       panel = function(x, y, ...) {
         # Plot the points
         panel.points(x, y, cex = 0.6, ...)
         panel.points(x = scores(nmds)[, 1], 
                      y = scores(nmds)[, 2], 
                      col = "salmon", pch = 16, cex = 0.6)
         panel.arrows(x0 = scores(nmds)[, 1], 
                      y0 = scores(nmds)[, 2],
                      x1 = x,
                      y1 = y, 
                      length = 0.04,
                      lwd = 0.4)
       })

^{Created on 2018-09-25 by the reprex package (v0.2.1)}

[x] Page 317, 1 st paragraph – A review on Random Forest by Hengl et al., with R examples, can be cited here (https://peerj.com/preprints/26693/).

- Done that. Thanks!

Chapter 15

[x] Page 328, 1 st paragraph – I could not locate the "supplementary article spatial-tidyverse at geocompr.github.io", perhaps the authors intended to publish it later on. In any case it's better to provide a URL of the article itself in the footnote. - https://github.com/Robinlovelace/geocompr/commit/e023b291a88c5a235be2d534db50505294fed8c1

Review 3

Chapter 10

[x] I think this is a very good chapter, it’s short and pithy and gets across a lot of great information. The practitioner in me was a little riled “triangulation is hard, centroids of convex polygons is not very useful, there’s more to it etc. etc.” but the examples provide so much extra learning value that is not about the specific task. Showing that one in fact can break these problems down to tiny tasks is very good - you can either do it, or find something that can do it and string together your own workflows as needed. Breaking down these monolithic tasks, is very helpful.
- many thanks for the feedback. Was worried writing it that it would just confuse people so I'm very glad to hear this from someone who's written lots of code (hopefully makes sense to newcomers also ; )
[x] Page 237: Computational Geometry by Berg et al. is very good, I’d also include “Computational Geometry in C” by Joseph O’Rourke, it’s more practical in terms of being applied to real code rather than pseudo, and is very complementary to Berg.
- I've had a look into this and much agree: excellent resource. It's now cited prominently.
[x] I worked through the code in this chapter and made some suggestions to the source repo that have been dealt with.
- Thanks!

Chapter 14

[x] I think the opening “Fog oases are one of the most fascinating ...” is a bit early. I’m looking for more of an overview of the chapter, that it requires topics from earlier chapters and “brings it all together” by assuming previous knowledge from earlier chapters. The Prerequisites does this, but I think the Introduction needs a longer description, something like: “We will model the floristic gradient of fog oases to reveal distinctive vegetation belts that are clearly controlled by water availability.” - Thank you! We have added two introductory sentences in accordance with your suggestions (commit 85566fd).
[x] Page 311: Sentence “Visualizing the data helps to get more familiar with it:” is left a bit hanging, I think it needs to refer to the Figure directly. I.e. "Visualizing the data helps to get more familiar with it, as in Figure 14-2 where the dem is overplotted by the random_points and the study_area : - Thanks! Changed in commit e902d455.
[x] Page 311: I think the data loading code should be explicit re the package: data("study_area", "random_points", "comm", "dem", "ndvi", package= "RQGIS") . - Done that.

Chapter 15

[x] Page 328: “as of summer 2018” is parochial, please list the month of the year. “Complimentary” should be “Complementary”.
- Good point. Done!
[x] Page 329: I’d like to see spatstat mentioned as another alternative stack, sp/rgdal/rgeos are less independent of the chosen sf/tidyverse/raster stack here.
- Done here: https://github.com/Robinlovelace/geocompr/commit/f41c94744475e5cc9ebf9d63d5a303f46c80d9b0
[x] Page 330: I find the “Geo " idiom confusing as is the sentence "The section is contains geo rather than geocomputation ...”. I don’t find the point is well made and I’m a bit unsure what it is. I think this section tails out a little vaguely and needs some tightening up.
- We've revamped the section substantially and it's much tighter now. Any final comments on it: welcome.
[x] Finally, I think I’m seeing an old version of the final section - it looks like the text has been greatly improved in the online version already. So, all the best - very good!

You were! Apologies for that but John suggested not sending incremental updates, which probably made your life easier. Comments very much appreciated.

geocompx / geocompr

Review 4 #302

Review 3

Chapter 10

Chapter 14

Chapter 15