PDAL / python

PDAL's Python Support
Other
115 stars 34 forks source link

Allow setting metadata dict on a PDAL writer stage to enable metadata forwarding #147

Open chambbj opened 1 year ago

chambbj commented 1 year ago

When working with Python, it is conceivable that users will read data via PDAL, operate on it using custom Python code, and then want to write the via PDAL. Currently, we can set certain metadata fields like scale and offset manually, but other metadata that would otherwise be forwarded in a traditional PDAL pipeline (at least with forward="all") cannot be so easily set. PDAL and the Python bindings need to provide a mechanism that would allow us to forward all metadata, but passing in the dict returned by pipeline.metadata when creating the writer stage.

hobu commented 1 year ago

What other metadata do you need to set here? The only other thing I can think of is any VLRs, but it is probably better to ferry those yourself so you get what you want.

If we were to do anything on this, I wonder if adding a settable metadata node to pdal::Stage pipeline members is the way to do it, not by patching something into the python bindings here.

leavauchier commented 10 months ago

Hello, I'm not sure if this would help, but I have one use case of the proposed mechanism : if you have a las reader, you do some specific stuff in python with your data and you want to write an output las file with the same srs, las_version, dataformat_id, extra_dimensions, etc... In case everything can't be done in a single pipeline, all this info would need to be fetched and parsed one by one, with the risk that some metadata are forgotten, so this could be handy to ba able do directly forward the whole metadata dict

diego-gris commented 7 months ago

@hobu, I agree with @leavauchier. I've noticed that if I capture the outputs of a Reader as a numpy array and later try to write them to a .las file by passing the numpy array to the Pipeline constructor method, the output file is missing CRS information. It would be great if we could also pass the metadata to the writer after processing the arrays in Python so that we can write files the same way we do when writing them directly after reading on the same pipeline.