Applied-GeoSolutions / lidar2dems

Utilities using PDAL and PCL to create DTMs, DSMs, and CHMs from lidar data
89 stars 36 forks source link

Wicked huge voxel files #35

Open bhbraswell opened 7 years ago

bhbraswell commented 7 years ago

This isn't a problem on our compute cluster but large files of 12-50 GB cause a severe bog-down when trying to do demonstration runs on a smaller system (16GB laptop).

For example the first step of automated logging scripts is to read in the voxel file and perform a summary calculation across vertical levels, and this read will not complete on a maxed out (8GB) Linux VM on my macbook.

From @F-Sullivan I think I've heard there is a natural way to divide or re-divide these files spatially via tiles, but I don't understand the implications yet.

F-Sullivan commented 7 years ago

LAS files are typically delivered by vendors in a tiled format - usually 500m x 500m or 1km x 1km. When we wrote lidar2dems, we had issues with classifying returns along tile boundaries, so we made an effort to reduce tile boundaries by joining tiles into larger regions for classification in lidar2dems (and all steps that followed). So the resulting LAS files were much larger and took on the shape of our region shapefiles that represented different cover types (I think we called them "site shapefiles"). These resulting LAS files were carried through the DEM processing.

If we changed the order a bit, while I'm not sure what it would do to processing times, we should be able to classify each of the sites, combine all of the sites into a single LAS file, then tile the large classified LAS file, which should generate more manageable tiled shapefiles. It also might introduce new issues with missing data along tile boundaries. At the very least, it should be possible to recombine and retile the dataset at some point for easier management and data sharing.

A schematic would show the processing steps a bit more clearly...

On Thu, Jul 13, 2017 at 12:39 PM, Bobby Braswell (Rob) < notifications@github.com> wrote:

This isn't a problem on our compute cluster but large files of 12-50 GB cause a severe bog-down when trying to do demonstration runs on a smaller system (16GB laptop).

For example the first step of automated logging scripts is to read in the voxel file and perform a summary calculation across vertical levels, and this read will not complete on a maxed out (8GB) Linux VM on my macbook.

From @F-Sullivan https://github.com/f-sullivan I think I've heard there is a natural way to divide or re-divide these files spatially via tiles, but I don't understand the implications yet.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Applied-GeoSolutions/lidar2dems/issues/35, or mute the thread https://github.com/notifications/unsubscribe-auth/AKdSXN1E4mELHA7uKGtguYm0GyU8Uw3Xks5sNkhUgaJpZM4OXQMW .