PermafrostDiscoveryGateway / viz-staging

PDG Visualization staging pipeline
Apache License 2.0
2 stars 1 forks source link

PDG Staging

Divides vector files into tiled vector files according to a specified OGC Two Dimensional Tile Matrix Set in preparation for processing into other formats in the PDG workflow.

PDG staging summary

Install

Requires Python version 3.9 and libspatialindex or libspatialindex-dev

  1. Follow the instructions to install libspatialindex or libspatialindex-dev
  2. Make sure that Python version 3.9 is installed (try which python3.9).
  3. Install pdgstaging from GitHub repo using pip: pip install git+https://github.com/PermafrostDiscoveryGateway/viz-staging.git

Usage

  1. Create a config JSON file for the staging job, see the docs for details, help(pdgstaging.ConfigManager) for all configuration options, and pdgstaging.ConfigManager.defaults for default config values.

From the command line:

In python:

import pdgstaging
stager = pdgstaging.TileStager('/path/to/config.json')
stager.stage_all()

# OR, to stage only one file
stager.stage('path/to/input/file.shp')

Vector file staging for the PDG tiling pipeline

This repository contains code that prepares vector data (e.g. shapefiles, geopackages) for subsequent steps in the PDG tiling pipeline (such as viz-3dtiles and viz-raster). The staging step creates output vector files that conform to a specified OGC Two Dimensional Tile Matrix Set ("TMS"). Specifically, for each input file, the staging process:

  1. Simplifies polygons and re-projects them to the Coordinate Reference System ("CRS") used by the desired TMS.
  2. Assigns area, centroid, and other properties to each polygon.
  3. Identifies duplicate polygons in the tiles.
  4. Saves polygons to one file for each tile in the specified level of the TMS.

Polygons are assigned to a tile file if the polygon is within the tile or if it intersects with the bounding box of the tile (i.e. if it is at least partially within that tile). This means that polygons that fall within two or more tiles will be duplicated in the output. (This allows subsequent rasterization steps to measure the area of polygons that are only partially within the tile - otherwise some area is lost). The duplicated polygons are labeled as such so they can be removed during staging or a later step in the PDG visualization pipeline. The step at which these polygons are removed is determined by the configuration file.

However, polygon-tile relationships are also identified using the centroid of each polygon: The centroid_tile property assigned to polygons identifies the tile within which the polygon's centroid falls. (In the rare event that a polygon's centroid falls exactly on a tile boundary, the polygon will be added to the southern/eastern tile.)

The centroid_within_tile property is True when the polygon's centroid is within the same tile as the output file. To avoid using duplicated polygons in subsequent tiling steps (e.g. when generating 3D tiles), first filter out all polygons where centroid_within_tile is False.

The area and centroid of each polygon are calculated in the CRS of the TMS. When this is a geographic coordinate rather than a projected coordinate, the resulting values may vary slightly from the real values.

Polygon properties

After being run through this staging process, each polygon will be assigned the new properties that are listed below. The names of these properties are configurable, but the default names are used here.

Summary fields

The staging process will also output a summary CSV file with one row for each tile created from each file. The fields in the CSV are:

Assumptions