PDAL / PDAL

PDAL is Point Data Abstraction Library. GDAL for point cloud data.
https://pdal.io
Other
1.14k stars 447 forks source link

[Feature request] writers.ept for generating Entwine tiles #2387

Closed sunapi386 closed 5 years ago

sunapi386 commented 5 years ago

I really like the EPT file structure, it's fast for responding to queries. Thanks for all the work there!

Small issue: There is the readers.ept but no writers. Currently I generate an intermediate .laz file (I found this to be smallest file size) and then use the docker image to generate EPT files.

docker run -it -v ~/entwine:/entwine connormanning/entwine build \
                        -i /entwine/sources/file.laz \
                        -o /entwine/file

I'd potentially work on this writer if there isn't already an implementation out there. Right now it's a minor nuance to generate the intermediate .laz but I suppose I can deal with it. Is this on a branch somewhere I'm unaware of?

connormanning commented 5 years ago

Can you describe your workflow a bit? What's the process that leads to these intermediate LAZ files?

There is an EPT writer in the works, but it does not write EPT datasets, rather it would be used to add new dimensions to existing EPT data (similar to the Greyhound writer workflow).

sunapi386 commented 5 years ago

Yes of course. So I've written a custom PDAL reader to consume binary serialized C-struct point cloud data (with some custom defined columns in the struct).

I convert this to EPT format so I can have two leverages:

  1. Potree can already consume from an EPT data source, so I can make use of its nice interface, to say select points or regions I'm interested in, visually.
  2. Querying said data in a "geospatial frame" fashion, which is faster than deserializing the data and finding the relevant readings. So the lidar data is transformed into UTM global coordinates, with the center of the lidar data as the capture source. The geospatial frame query would return all associated point cloud data (points) that had been captured at that UTM coordinate.

Anyway that's my use case. As to why intermediate laz files, I convert to this because I can retain my custom data columns in LAS/LAZ file format, and Entwine can consume this. But I really would rather just write EPT directly, as LAS/LAZ is bad for handling large amount of point cloud data. (order of 100G-1TB)

connormanning commented 5 years ago

Are there major drawbacks to simply using Entwine itself? If I understand correctly, you've written a PDAL plugin which reads your data format. So if you've created a PDAL plugin that can read .abc files, and you have Entwine built, you could simple run entwine build -i something.abc -o ~/output.

I'm just trying to figure out an improved workflow that is already supported between these projects. I tend to think there's a good workflow out there already that doesn't require PDAL to generate EPT. While possible, I think adding support for this would come with its own drawbacks.

sunapi386 commented 5 years ago

you've written a PDAL plugin which reads your data format

Yes.

you could simple run entwine build -i something.abc -o ~/output

Oh neat. I didn't know entwine can make use of PDAL plugins. I'll try this way out and report back. Although I am reading a JSON that points to multiple files, so it may not be straight forward.

Also heads up that I made & installed entwine, got this error:

~/w/e/build (master)> entwine
entwine: error while loading shared libraries: libentwine.so.2: cannot open shared object file: No such file or directory

I made sure the files existed (and it said it was installed correctly)

Install the project...
-- Install configuration: "Release"
-- Installing: /usr/local/lib/libentwine.so.2.0.0
-- Installing: /usr/local/lib/libentwine.so.2
-- Installing: /usr/local/lib/libentwine.so

This was fixed by sudo /sbin/ldconfig -v.

sunapi386 commented 5 years ago

I tend to think there's a good workflow out there already that doesn't require PDAL to generate EPT.

While it's not necessary, but PDAL has a bunch of useful filters built in, so I'd imagine you would need to use PDAL to do some sort of data cleaning / processing before creating the EPT files.

While possible, I think adding support for this would come with its own drawbacks.

What kind of drawbacks do you have in mind?

And update on the attempt to entwine build -i something.abc -o ~/output (.abc -> .bfjson):

entwine build -i ~/entwine/sources/files.bfjson -o ~/entwine/tester
Scanning input
SRS could not be determined
Encountered an error: No points found!
Exiting.

My files.bfjson is just a json that specifies the files I want to read. E.g.

$ cat files.in.json 
{
  "lidar":"vls128_lidar",
  "rtk":"rtk_msgs",
  "affine":"affines.json",
  "dumpFrames": true
}

I specified the type of extension I'd like to use in StaticPluginInfo but my plugin doesn't seem to be able to read the associated file.

pdal translate --input=/home/jsun/files.bfjson --output=output.laz  --developer-debug -v 8 
(PDAL Debug) Debugging...
terminate called after throwing an instance of 'pdal::pdal_error'
  what():  Cannot determine reader for input file: /home/jsun/files.bfjson
fish: “pdal translate --input=/home/js…” terminated by signal SIGABRT (Abort)
connormanning commented 5 years ago

so I'd imagine you would need to use PDAL to do some sort of data cleaning / processing before creating the EPT files

You can apply a full PDAL pipeline within Entwine itself with a pipeline configuration key, although this was not documented for Entwine 2.0 (in fact it still isn't, but will be for 2.1) - it's currently pseudo-private API and should be considered experimental. This pipeline is applied per input file however, not to the entire input set in aggregate, like PDAL does.

What kind of drawbacks do you have in mind?

Well, there are 2 possible approaches with different sets of issues.

  1. Dynamically link Entwine (which has PDAL as a dependency) into PDAL itself. However, PDAL's data model is very different from Entwine's: PDAL uses a single-thread/single-pipeline model, while Entwine performs many PDAL pipeline executions in parallel during operation. This causes lots of quirks like poor performance - do we optimize for both Entwine's native data model or PDAL's? Or fork the code? Entwine's input execution also reads into custom PointTable datatypes to control the memory model, a big no-no within PDAL itself which would need to be swapped out or introduce a needless copy. Configuration would also be quirky - you'd have to expose a subset of Entwine's configuration options since many make no sense in this context, like reprojection which shouldn't be handled by a writer but rather upstream in a pipeline.

  2. Create PDAL's own octree builder/EPT writer. This would be some subset of Entwine's functionality, optimized for use in PDAL's own codebase. That doesn't seem like useful work to me when a purpose-built project already exists (and was split out of PDAL for many of the reasons already discussed).

To me, Entwine provides the best workflow since it's purpose-built for this application. Since it depends on PDAL we can expose PDAL goodies like pipeline stages and dynamic plugins, but still retain control of the execution model and a purpose-built configuration rather than them being spread throughout the more generic and powerful PDAL.

I'd be happy to help get you set up with a working Entwine workflow using your dynamic plugin and potentially pipelining within Entwine - let's move to a PM in PDAL's gitter or over email to do that. If it ends up being a deficient workflow, then we can revisit with more info.

connormanning commented 5 years ago

Closing for now - the current recommended workflow for generating EPT datasets is to use Entwine. Feel free to open an issue there for supporting your workflow.

sunapi386 commented 5 years ago

Thank you for the feedback! My current flow is to use use PDAL to generate LAZ and then use entwine to consume LAZ and generate EPT. I'll re-evaluate the workflow when Entwine 2.1 comes out to remove the intermediate LAZ artifact. But for now I can make do with this pipeline unless I run into performance/scalability issues.

patxg commented 4 years ago

so I'd imagine you would need to use PDAL to do some sort of data cleaning / processing before creating the EPT files

You can apply a full PDAL pipeline within Entwine itself with a pipeline configuration key, although this was not documented for Entwine 2.0 (in fact it still isn't, but will be for 2.1) - it's currently pseudo-private API and should be considered experimental. This pipeline is applied per input file however, not to the entire input set in aggregate, like PDAL does.

What kind of drawbacks do you have in mind?

Well, there are 2 possible approaches with different sets of issues.

  1. Dynamically link Entwine (which has PDAL as a dependency) into PDAL itself. However, PDAL's data model is very different from Entwine's: PDAL uses a single-thread/single-pipeline model, while Entwine performs many PDAL pipeline executions in parallel during operation. This causes lots of quirks like poor performance - do we optimize for both Entwine's native data model or PDAL's? Or fork the code? Entwine's input execution also reads into custom PointTable datatypes to control the memory model, a big no-no within PDAL itself which would need to be swapped out or introduce a needless copy. Configuration would also be quirky - you'd have to expose a subset of Entwine's configuration options since many make no sense in this context, like reprojection which shouldn't be handled by a writer but rather upstream in a pipeline.
  2. Create PDAL's own octree builder/EPT writer. This would be some subset of Entwine's functionality, optimized for use in PDAL's own codebase. That doesn't seem like useful work to me when a purpose-built project already exists (and was split out of PDAL for many of the reasons already discussed).

To me, Entwine provides the best workflow since it's purpose-built for this application. Since it depends on PDAL we can expose PDAL goodies like pipeline stages and dynamic plugins, but still retain control of the execution model and a purpose-built configuration rather than them being spread throughout the more generic and powerful PDAL.

I'd be happy to help get you set up with a working Entwine workflow using your dynamic plugin and potentially pipelining within Entwine - let's move to a PM in PDAL's gitter or over email to do that. If it ends up being a deficient workflow, then we can revisit with more info.

Hey @connormanning , love the work. New to entwine. I'm trying to display my own JSON's. I was wondering if we could chat? Thank you!