Alberta-Geological-Survey / depth-to-bedrock

Code repository for the 'Evaluating spatially enabled machine learning approaches for depth to bedrock mapping' PLOS ONE article
MIT License
5 stars 1 forks source link

Depth to bedrock prediction

Code repository for the PLOS ONE article ‘Evaluating spatially enabled machine learning approaches for depth to bedrock mapping’. https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0296881

Overview

The depth to bedrock prediction model for Alberta uses an R project workflow based on the tidyverse and tidymodels suite of packages. The workflow is re-executed via the following steps:

  1. Data preparation (downloading publicly available remote sensing datasets and terrain analysis using the Rsagacmd package).

  2. Water well litholog augmentation by classifying lithological descriptions into surficial and bedrock units based on a statistical natural language processing approach and text pattern matching.

  3. Model evaluation and selection based on cross validation and quality of the predicted DTB maps in several physiographically varying sub-regions.

  4. Final DTB prediction.

Requirements

This R project uses a reproducible environment based on the renv package. This environment uses R package versions that were installed under Ubuntu 22.04 and is not guaranteed to cleanly install on other operating systems, particularly Microsoft Windows. To install the required packages, run the following code in the R console. The packages and versions that will be installed are specified in the renv.lock file.

renv::restore()

This project has an external dependency - SAGA-GIS (>= 7.3) where the saga_cmd binary needs to be available in the system path. The Rsagacmd package package is used to run the SAGA-GIS algorithms from R. SAGA-GIS can be installed in Ubuntu 22.04 using the following commands:

sudo apt-get install saga

For Windows users, the installation has to be performed manually by downloading the SAGA-GIS binary from Sourceforge and then adding the path to the SAGA installation directory to PATH. A more complete install of QGIS including SAGA will also get you the saga_cmd.exe executable. For example:/ C:\Program Files\QGIS 3.28.12\apps\saga\saga_cmd.exe.

In addition, the project automatically downloads the required remote sensing datasets. To download the MODIS data, a free NASA Earthdata login is required. The login credentials need to be set in a .Renviron file (a plain text file with no extension) in the project root directory as ‘EARTHDATA_USER’= and ‘EARTHDATA_KEY’=, or set as environment variables in the R session. This can be performed by:

Sys.setenv("EARTHDATA_USER" = "username")
Sys.setenv("EARTHDATA_KEY" = "password")

For the statistical natural language model prediction, XGBoost using GPU is used. This requires a CUDA (Compute Unified Device Architecture) enabled GPU and the CUDA toolkit to be installed. Alternatively, the model can be trained on CPU by changing tree_method = ‘hist’ in the 02-nlp.qmd script, but this will result in a significant increase in training time.

Folder structure

The folder structure is organized as follows:

Data

The data includes the following:

Code

Functions used by the scripts are stored in the R folder. The scripts are organized into the following files:

Once these scripts are run, additional directories and model outputs will be created within the project directory. The final directory structure will look like:

fs::dir_tree(recurse = 2)
#> .
#> ├── LICENSE
#> ├── 01-grids.qmd
#> ├── 02-nlp.qmd
#> ├── 03-training-data.R
#> ├── 04-experiments.R
#> ├── 05-idw.R
#> ├── 06-kriging.R
#> ├── 07-provincial-model.R
#> ├── 08-analysis-results.qmd
#> ├── R
#> │   ├── dtb.R
#> │   ├── nlp.R
#> │   ├── plots.R
#> │   ├── predictors.R
#> │   └── resampling.R
#> ├── plos-one.csl
#> ├── plos2015.bst
#> ├── zotero.bib
#> ├── README.Rmd
#> ├── README.md
#> ├── _dependencies.R
#> ├── data
#> │   ├── processed
#> │   │   ├── picks-nlp.rds
#> │   │   ├── predictors.tif
#> │   │   └── training-data.rds
#> │   └── raw
#> │       ├── alos-dem.tif
#> │       └── modis.tif
#> ├── depth-to-bedrock-plos-one.Rproj
#> ├── models
#> │   ├── cross-validation-dtb-prov.rds
#> │   ├── cross-validation-idw-prov.rds
#> │   ├── cross-validation-kriging-prov.rds
#> │   ├── experiments-cross-validation.rds
#> │   ├── experiments-dtb.rds
#> │   ├── experiments-importances.rds
#> │   ├── experiments-models.rds
#> │   ├── model-dtb-prov.rds
#> │   ├── model-nlp.rds
#> │   └── resamples-nlp.rds
#> ├── outputs
#> │   ├── dtb-rf-prov-pred-int.tif
#> │   ├── dtb-rf-prov.tif
#> │   ├── idw-prov.tif
#> │   ├── kriging-prov.tif
#> │   ├── picked-combined.gpkg
#> │   ├── picks-nlp-cv.csv
#> │   ├── picks-nlp.csv
#> │   └── picks.csv
#> ├── projdata
#> │   ├── cross-section-line-rainbow-lake.geojson
#> │   ├── cross-section-line-wcab.geojson
#> │   ├── lithologs.rds
#> │   ├── physio-pettapiece.gpkg
#> │   └── picks.rds
#> ├── renv
#> │   ├── activate.R
#> │   ├── library
#> │   │   └── R-4.3
#> │   ├── settings.json
#> │   └── staging
#> └── renv.lock