geotrellis / geotrellis-pointcloud

GeoTrellis PointCloud library to work with any pointcloud data on Spark
Apache License 2.0
26 stars 10 forks source link

When reading large files with the new FileStreamRecordReader gives **No codec found** #13

Closed romulogoncalves closed 6 years ago

romulogoncalves commented 6 years ago

When reading the point with the new stream approach from #11 we see the following warning being reported:

No codec found for hdfs://ecolidar0.eecolidar-nlesc.surf-hosted.nl:9000/user/hadoop/ahn3/large_data/C_25GN2.laz, reading without compression.

It seems it is not reading as a compressed file that will lead to corrupted data. In #11 we tested by reading the metadata to trigger the creation of the RDD. The metadata was correct so we assumed things were working because the RDD had to be populate before we could retrieve the metadata. However, we did not check the data content to see if it is corrupted or not.

It is strange that the codec is not found because if we misspell it geotrellis-pointcloud reports an error.

We read the file like this:

val pipelineExpr = LasRead("local", compression = Option("lazperf"))
val rdd_laz = HadoopPointCloudRDD(laz_path, options = HadoopPointCloudRDD.Options(pipeline = pipelineExpr, tmpDir = tmpDir_str, dimTypes = Option(List("X", "Y", "Z", "Classification"))))
pomadchin commented 6 years ago

It's not an issue, it's a report that i can move to a debug level. It is about gz compression: https://github.com/geotrellis/geotrellis-pointcloud/blob/master/src/main/scala/geotrellis/pointcloud/spark/io/hadoop/formats/FileStreamRecordReader.scala#L41

pomadchin commented 6 years ago

Fixed in https://github.com/geotrellis/geotrellis-pointcloud/commit/b91f6bcabb6675d26f1e060097c81b4e42c70716

pomadchin commented 6 years ago

Published as https://bintray.com/azavea/geotrellis/geotrellis-pointcloud/0.2-b91f6bc