geotrellis / geotrellis-pointcloud

GeoTrellis PointCloud library to work with any pointcloud data on Spark
Apache License 2.0
26 stars 10 forks source link

Document EPT catalog generation #52

Closed pomadchin closed 4 years ago

pomadchin commented 4 years ago

Document EPT catalog generation and updating it from a set of las / laz files. It should be probably a part of a README.md file.

We can use connormanning/entwine docker image for the ept catalog generation decription (see entwine repo).

Catalog builder args description

jpolchlo commented 4 years ago

Structure of EPT Hierarchies

An EPT is based on an octree structure. The entire point set has a volumetric extent which serves as the root of the tree. Each cell of the tree has a corresponding LAZ file in the hierarchy. The following 2-d illustration demonstrates the structure of the tree using a quadtree.

+-------------+
 \             \
  \     0,0     \   Depth: 0
   \             \
    +-------------+
+------+------+
 \ 0, 0 \ 1, 0 \
  +------+------+   Depth: 1
   \ 0, 1 \ 1, 1 \
    +------+------+
         .
         .
         .

Each tree cell also has a logical grid associated with it that uniformly subdivides its spatial extent. When building the LAZ files, Entwine endeavors to include about one point per logical grid cell. The logical grid partitioning each axis into equal intervals, with span partitions along each axis. The choice of span dictates the "cell size" of the various hierarchy levels, and it also sets the rough size of each constituent LAZ file. All points will be added to a LAZ file somewhere in the hierarchy, with dense areas being represented by a deeper tree.

Related Entwine documentation: https://entwine.io/entwine-point-tile.html

Creating EPT Hierarchies

Generating a new EPT hierarchy is generally a straightforward affair with little in the way of tuning. By and large, how the EPT will be used determines the desired grid. An example should help.

Let's imagine that we want to convert the point data into DEM rasters on demand. Then the grid parameters for the rasters should guide the layout. Assuming that the DEM rasters will be requested via some tiled queries in the WebMercator projection in a standard power-of-two layout, one would find the smallest layout tile (i.e., in the highest possible zoom level) that entirely contains the point set to be encoded. The bounds of that tile can be passed to

entwine build --bounds "[<xmin>,<ymin>,<zmin>,<xmax>,<ymax>,<zmax>]"

where the limit values will be filled in appropriately (the Z limits should be left unmodified). The --span parameter in this case should match the size of the request tiles. The default value of --span 256 will likely work well.

It is possible to use different span parameters, to trade off different performance characteristics. Smaller span values imply more, smaller files will be accessed in the course of a read at a given zoom level, but fewer points in total (and bytes) will be streamed when the query region is a small subset of the whole extent. This difference may become important when balancing the cost of queries and data transfer over S3, for example; but one may also observe a time penalty for each new file access, meaning that some benchmarking may be required to assess the balance point between cost and performance.

In general, the minimal command for creating an EPT hierarchy from a directory is summarized as

entwine build --bounds "[<xmin>,<ymin>,<zmin>,<xmax>,<ymax>,<zmax>]"
              --span <value>
              --input <directory>
              --output <location>

One may, of course, use additional features of Entwine for this operation; issue entwine build --help for more details.

Related Entwine documentation: https://entwine.io/configuration.html#build