Write Point Clouds in a PDAL supported format, such as LAZ, into HDFS and S3

geotrellis / geotrellis-pointcloud

GeoTrellis PointCloud library to work with any pointcloud data on Spark

Apache License 2.0

26 stars 10 forks source link

Write Point Clouds in a PDAL supported format, such as LAZ, into HDFS and S3 #9

Open romulogoncalves opened 6 years ago

romulogoncalves commented 6 years ago

Currently it is only possible to store data as a GeoTrellis layer. To save as a LAZ file what we do is either save the LAZ file into a local directory which is mount of an S3 bucket or we save it into temporary file and then we upload it to HDFS.

Would it be possible to save the results after a pipeline execution directly into HDFS or S3? Are there plans to have such functionality? Or should such functionality come from PDAL?

I would imagine something like this:

val pipelineExpr = Read("local") ~ HagFilter() ~ LasWrite("s3a://out.laz")

pomadchin commented 6 years ago

Current Pipeline is a PDAL pipeline and translates into PDAL JSON. There are two solutions: 1. move it into PDAL or 2. to implement smth on our side, but I'm not sure that PDAL exposes it.

romulogoncalves commented 6 years ago

Yes, I thought the same. Such feature would be very welcomed.

For now we are just exploring geotrellis, geotrellis-pointcloud and PDAL and once we start using it for our projects we might contribute back.

I think have it implemented in PDAL would be the right option. Also it should allow the user to save data into multiple files since large files are hard to handle in Spark if streaming is not available.

pomadchin commented 6 years ago

We can write outputs to local fs and to copy the result into HDFS or into S3, sounds like a DSL enhancement and probably can be even implemented in terms of PDAL-Scala.