geotrellis / geotrellis-pointcloud

GeoTrellis PointCloud library to work with any pointcloud data on Spark
Apache License 2.0
26 stars 10 forks source link

Investigate IDWRasterSource #64

Closed jpolchlo closed 4 years ago

jpolchlo commented 4 years ago

We've recently put in some notable efforts to provide a DEMRasterSource implementation based on a Delaunay triangulation. This approach is well-warranted when the input data are somehow incomplete, say, when the lidar scanner misbehaves over water, or when there are "shadows" behind objects during off-nadir scans. The triangulation-based approach excels when the data present with such holes, and it will fill these regions gracefully.

But if possible, the data will be fixed in order to avoid these cases. Either multiple scans from different directions will be overlaid, or points might be imputed (say for water features). And when the point set inputs are well-behaved, the cost of triangulation is not justified. Moreover, we might start to see undesirable artifacts, as in the following: whole

The point samples comprise an irregular-shaped region with non-convex boundary, and the triangulation fills in the non-convexities with triangles that don't represent the underlying point sample or the actual elevation map. These triangles can be filtered out, but it imposes an additional cost, and some spurious triangles near tight corners of the boundary might slip through.

The previous image represents a region with well-behaved lidar sampling, free from substantial holes. In this case, we might opt for an inverse distance-weighted (IDW) interpolation of the point set. This raster-based method should easily outperform the Delaunay method in terms of time, and given proper inputs, will generate satisfactory results.

A prototype IDW rasterization: idw

We can see a few differences:

  1. The boundary tracks the boundary of the point set with only minimal dilation.
  2. The result gives a weighted average of all points and is, therefore, less susceptible to noise.

Point 2 requires a bit of clarification. In the triangulation version, each pixel is set to the height of the elevation geometry at the sample point. There is no averaging. We have observed that in treed areas, the lidar scans produce a mix of points in the canopy and along the ground. The resulting height field appears to be very noisy in these regions (because we are doing a 2.5-d triangulation, rather than a true 3d surface reconstruction). As a result, in these noisy regions, we could sample and get a value anywhere from the ground to the top of the canopy.

The IDW version will average the different elevations in these areas. Which may not be what is desired, but is at least a great deal more stable. Hence, there are features in the IDW rendering that are not at all obvious from the triangulation version. (We can improve the triangulation version, but at the cost of performance, and the triangulation version already requires 1–2 orders of magnitude longer runtimes than IDW.)

Questions arise from this having to do with the nature of the application and what is best. Triangulation is warranted if substantial gaps in the data are expected, but it comes at a significant price.