NCC-CNC / wtw-data-prep

Scripts to prepare data for Where To Work tool
0 stars 0 forks source link
cons-tech etl prz

Where To Work data prep

This repo assists in formatting your data into the four mandatory files (described below) required for import into Where To Work (WTW) when using the upload project data method.

These scripts can be used when planning units are grid cells that can be passed to WTW as raster layers where each raster cell is a unique planning unit. For non-grid planning units, a different workflow is required (#9).

When starting a new project, we recommend copying the scripts, functions, and data from this repo into your project folder. You can then edit the scripts and run them from your project folder.

Note Basic coding skills in R (and possibly Python) are required to use these scripts.

Objective

The objective of these scripts is to get the source data into a standardized set of raster files where the raster grid cells represent the planning units. The source data is also used to prepare a meta data csv table that defines the WTW parameters for each raster. The 05_wtw_formatting.R script can then package the data into the WTW format so it can be loaded into WTW.

image

Workflows

  1. The most common workflow is to use NCC's standard 1km grid as the planning units, and the standard set of national datasets that have been pre-prepped into the 1km grid. This workflow simply extracts the 1km planning units and the pre-prepped data for all planning units covering the AOI.

image

  1. Some users may wish to add some additional datasets to workflow 1, to replace the standard national datasets with their own regional data, and/or to use a different sized planning unit grid. This requires the user to prepare thier own raster datasets to supplement the standard set of national data. This typically involves intersecting the data to the planning unit grid and summarizing the data values per planning unit. See the regional_example folder for scripts to do this.

image


  1. Some users may wish to use a non-grided set of planning units. This involves intersecting the data to the planning units and providing shapefile instead of raster inputs to Where to Work. See the shapefile_example folder for scripts to do this.

Note Users using custom planning units who want to add the standard national datasets will need to access the original raster or vector versions of these datasets and apply them in workflow 2 or 3. The pre-prepped national datasets can only be used with the standard NCC 1km grid.

Definitions

NCC grid - the NCC 1km that covers all of Canada.

AOI - area of interest, usually a polygon defining the study region for a WTW project.

Planning units (PU) - the 'building blocks' used in WTW to construct solutions. In workflow 1 the PU's are the NCC grid cells that intersect with the AOI. In other workflows PU's could be a different sized grid, or any collection of non-overlapping polygons. The goal of the data prep workflow is to summarise each input dataset within each planning unit.

Input datasets - The data representing Themes, Weights, Includes and Excludes to be used in WTW. Each inut dataset needs to described by a single value in every planning unit.

Where to Work input data formats

There are three main formats that Where to Work will accept for loading data:

  1. The four WTW input files desribed below, where the spatial input file is a raster and each planning unit is a cell in the raster grid. This is the preferred input format because it's the fastest to prepare and load into WTW.

  2. The four WTW input files desribed below, where the spatial input file is a shapefile and each planning unit is a polygon defined in the shapefile. This is the preferred input format when non-grid planning units are required (i.e. the planning units cannot be represented in a raster grid).

  3. A single shapefile where each polygon represents a planning unit and each column in the attribute table represents a Theme, Includes, Excludes or Weight. This format is not recommended because it's slower to load, and requires all WTW parameters to be set manually in the app instead of being defined in the input data. See the shapefile_example for more details.

Planning unit data formats

WTW runs prioritizations using the values assigned to each planning unit from the input data. All input data need to be represented by a single value in each planning unit. It's important that users of WTW understand what their data represent, especially for users adding their own data into the tool.

National data

The following scripts in this repo are used to prepare the standard national datasets using the NCC 1km planning units (i.e. workflow 1 described above). More details on each script are provided in the scripts section. We recommend making an empty project folder and using RStudio to start a new RSudio project in that folder. Copy the scripts folder from this repo into the project folder:

Regional data

Any user provided datasets that are not part of the standard national datasets are referred to as Regional data. These are typically vector or raster layers that need to be summarized per planning unit. An example workflow for this is provided in the regional_example folder.

Projections

Scripts

These scripts require R, Rstudio (recommended) and RTools (required for installing wheretowork package) be installed.

01_initiate_project.R

Sets up the folder structure and copies the AOI shapefile into the PU folder.

Inputs

Outputs

02_aoi_to_1km_grid.R

Creates the planning unit grid using all NCC 1km grid cells that intersect the AOI.

Inputs

Outputs

Takes a polygon shapefile (AOI) ...

and generates a NCC 1km vector grid.

03_natdata_to_1km_pu_grid.R

_Used in conjunction with aoi_to_1kmgrid.R to extract pre-prepped 1km NATIONAL data to the planning units.

Inputs

Outputs

Warning Data needed to run this script is not packaged in this repo.

National Datasets

Themes
Weights
Includes

03b_summarize_species.R

_Optional scripts, run after 03_natdata_to_1km_pu_grid.R to list the species intersecting the AOI._

Inputs

Outputs

Warning Data needed to run this script is not packaged in this repo.

04_populate_nat_metadata.R

_Automates the creation of a metadata .csv table that is used in 05_wtw_formatting.R._ _Once created, the metadata csv table should be manually QC'd before proceeding to 05_wtw_formatting.R._

Note If using REGIONAL data, the metadata table must be created manually (or edited manually if using both NATIONAL and REGIONAL data).

Inputs

Outputs

Metadata table columns

(you can view a QC'd version here.):

05_wtw_formatting.R

Creates the four files required by WTW.

Inputs

Outputs

  1. configuration.yaml:
    The configuration file defines project attributes, legend elements / map display in the left side bar 'Table of contents' and initial goals in the 'New solution' right side bar.

  2. spatial.tif:
    The spatial tiff file defines the spatial properties of the planning units, such as cell size, extent, number of rows, number of columns and coordinate reference system. It acts as the template to build rasters from columns within the attribute.csv.gz.

    Note The spatial file can also be a shapefile where each planning unit is a different polygon.

  3. attribute.csv.gz:
    The attribute file defines the cell values of each theme, weight, include and exclude in tabular form. Each column in the .csv is a variable.

  4. boundary.csv.gz:
    The boundary file defines the adjacency table of each theme, weight, include and exclude. It stores information on the perimeter and shared boundary lengths of the planning units. This is needed to run optimizations for spatial clustering.




Icons made by Freepik from www.flaticon.com