Reviews - edition 2, round 2, part 2

Hot on the heels of https://github.com/geocompx/geocompr/issues/898

[x] Foreword needs obviously to be rewritten - much has changed in the geospatial R world since 2018 (shockingly much) RL

Clearly it does. We have created a placeholder for another foreword for the 2nd edition. We plan to wait until the manuscript is finished, or at least very close, before tackling this.

[x] Page xii: "If you are interested in the wider context and motivations behind this book, read on; these are covered in Chapter ??." Read on means continue reading here - not jumping to another chapter. RL

We agree this was not well written. Fixed. The relevant section now reads:

The wider context and motivations underlying this book are covered in Chapter 1.

[x] Page 11, line 9: "a raster data" should read 'a raster' or 'a raster dataset'. RL

Agreed. The sentence now reads:

The seed_generation tool takes a a raster dataset as its first argument (features); optional arguments include band_width that specifies the size of initial polygons.

[x] Page 15, bottom paragraph: link2GI packages have not been mentioned before. How does this term relate to the three packages (qgisprocess, Rsagacmd, rgrass) discussed here? Ditto wrt GDAL and link2GI on page 21

The link2GI package just makes it easy to initiate a GRASS session from within R without the need to fully grasp how GRASS works in the background. However, for the interested reader or GRASS power users we have added the link to the GRASS help pages which show step-by-step how to do so. Please note that we have deleted the appendix showing the same instructions in favor of the GRASS help pages.

In the case of SAGA and GDAL, link2GI searches the system for the corresponding command line utilities and adds the corresponding paths for the current R session to the PATH variable. This is unnecessary in the case of qgisprocess since qgisprocess ensures by itself when being attached that a working QGIS version is installed on the system.

[x] Page 16 (middle): the paragraph on rgrass <-> terra is demanding, especially for folks who don't know GRASS' internal data organization, which is notoriously difficult to grasp for GIS novices (the authors address this by adding the GRASS setup appendix - but this comes only on page 18. I suggest moving it to page 16, possibly as a textbox insert).

Please see previous reply and reply after the next.

[x] It is also counter-intuitive to use terra (which is intended to replace the raster package) to create points and lines in a GIS that is mostly used for raster operations.

You are right that terra is of course predominantly a raster processing package, however, it also supports vector features and rgrass expects terra::vect() objects as input.

[x] The multiple levels of data casting is mind-boggling. The authors seem to acknowledge this by pointing to their blog posts and the coerce vignette but this is exactly why this example is not suitable for the given audience and in an introduction to GIS-bridging.

We agree that the section in question is demanding and probably more suitable for experienced (GRASS) GIS users. The reasoning behind this is as follows:

a former reviewer asked us to provide a more complex GRASS example, specifically one, which cannot be solved using R's "native" spatial capabilities.
users specifically wanting to use GRASS are probably already familiar with it, and will therefore struggle less with the example than users who don't need the full power of GRASS. The latter, however, can at least use a subset of its functionality through QGIS.

In any case, we now warn the reader before jumping into the code as follows:

Please note that the code instructions in the following paragraphs might be hard to follow when using GRASS for the first time but by running through the code line-by-line and by examining the intermediate results, the reasoning behind it should become even clearer.

[x] Page 18, line 2: GRASS' spatial database is not based on SQLite; GRASS has its own native data organization. Instead, the default format for connecting GRASS to an external database using db.conneect is SQLite. The same erroneous description is repeated in the discussion of the GRASS databse organization.

Thanks for noting, the description was indeed misleading. We have updated the corresponding sections after thoroughly reviewing what GRASS is actually doing in the background (see also https://github.com/geocompx/geocompr/issues/412).

[x] Page 25, bottom: I am happy to see mention of GeoMesa and Sedona but the last sentence is grammatically garbled. RL

Agreed. See https://github.com/geocompx/geocompr/commit/97edb6804b84aa73c16181b05819e499cbf747dc for fix

[x] PAGE 26ff: Section 1.7 is a great addition to the second edition of the book!
[x] Page 32, 3rd para: The juxtaposition of ML to Bayesian inference is nonsense - the authors are misquoting Krainski et al, who use Bayesian techniques for predictions. The omission of the Bayesian approach is the one major limitation of the whole volume Gecomputation with R !

My point here was to emphasize that you cannot do statistical inference with ML, but I see why one can misinterpret the sentence. Thinking about it, the inference stuff does not add much value here but is obviously distracting. Therefore, we have removed it.

Secondly, I agree that the Bayesian approach to modeling is quite interesting, however, it is beyond the scope of the book and there are already books out there presenting it in much greater detail than this book ever could. Still, we have updated the section on including spatial autocorrelation in models as follows:

Here, when making predictions we neglect spatial autocorrelation since we assume that on average the predictive accuracy remains the same with or without spatial autocorrelation structures. However, it is possible to include spatial autocorrelation structures into models as well as into predictions. Though, this is beyond the scope of this book, we give the interested reader some pointers where to look it up:

The predictions of regression kriging combines the predictions of a regression with the kriging of the regression's residuals [@goovaerts_geostatistics_1997; @hengl_practical_2007; @bivand_applied_2013].

One can also add a spatial correlation (dependency) structure to a generalized least squares model [nlme::gls(); @zuur_mixed_2009; @zuur_beginners_2017].

One can also use mixed-effect modeling approaches. Basically, a random effect imposes a dependency structure on the response variable which in turn allows for observations of one class to be more similar to each other than to those of another class [@zuur_mixed_2009]. Classes can be, for example, bee hives, owl nests, vegetation transects or an altitudinal stratification. This mixed modeling approach assumes normal and independent distributed random intercepts. This can even be extended by using a random intercept that is normal and spatially dependent. For this, however, you will have to resort most likely to Bayesian modeling approaches since frequentist software tools are rather limited in this respect especially for more complex models [@blangiardo_spatial_2015; @zuur_beginners_2017].

[x] As for the statistical learning chapter, I would prefer if the authors used a dedicated random forest model such as spatialRF or the grf function in the SpatialML package rather than mlr3.

In the statistical learning chapter we focus on performance estimation. The big advantage of using mlr3 is that one can compare dozens or even hundreds of learners, resampling strategies and tasks using the same interface. If the learner in questions does not yet exist, it should be fairly easy to implement it in the mlr3extralearners package. Please refer also to reply to comment Pages 40ff.

[x] Page 35, caption for Figure 2.2: It depicts the spatial distribution of susceptibility values. The term "spatial prediction" is misleading as the GLM is not a spatial model as in spatial regression. Given the importance of spatial and geographically weighted regression (as well as the kriging technique mentioned in the following paragraph), the way Jannes is using the term spatial prediction is unfortunate. JM

I get the point, however, I have to admit that as far as I know the term "spatial prediction" is not reserved for modeling techniques incorporating the spatial structure in one form or another into the model itself. In any case, wherever possible we replaced "spatial prediction" with predictive mapping or spatial distribution.

[x] Page 37, last paragraph: The First Law of Geography was coined by Tobler in 1970, who should be cited here, not the symposium summary by Miller in 2004. RL
[x] Pages 40ff: This chapter relies heavily on the mlr3 metapackage, which in turn requires quite a lot of understanding of machine learning methodology and terminology. What is actually implemented in this chapter does not warrant the use of such heavy machinery. GLM and cross-validation are standard tools in R and for support vector machines, there are a dozen individual packages available that require less background knowledge. finally, if the authors really want to go through the effort of explaining concepts like hyperparameters, then I urge them to also introduce Bayesian spatial models such as the family of CAR models, Stochastic Partial Differential Equations, or (non-)Gaussian Markov Random Fields, much of which is covered by Krainski et al.'s INLA method.

At the beginning of the spatial cv with mlr3 section we point out why we are going to the trouble of learning the mlr3 syntax as follows:

There are dozens of packages for statistical learning, as described for example in the CRAN machine learning task view. Getting acquainted with each of these packages, including how to undertake cross-validation and hyperparameter tuning, can be a time-consuming process. Comparing model results from different packages can be even more laborious. The mlr3 package and ecosystem was developed to address these issues.

Secondly, spatial cross-validation is by no means a standard tool in R packages, only random cross-validation is. Finally, regarding your suggestion to explain Bayesian spatial models, please refer again to our reply to comment Page 32, 3rd para.

[x] Transportation Application is fine. I was using this chapter in my graduate spatial analysis class this fall 2022 and it worked without a glitch.
[x] Ecology Application is mostly fine as well. I would appreciate it if Jannes could remove the personal element (just a style issue). RL could you pls check if there is something off with the style. The only personal element I could find is the reference to "one of the most fascinating vegetations we have ever encountered" which I rewrote to "Fog oases are fascinating vegetation formations, locally termed lomas, which develop..."

geocompx / geocompr

Reviews - edition 2, round 2, part 2 #911