PDAL / python

PDAL's Python Support
Other
115 stars 34 forks source link

Pipeline schema not filled in when executed in streaming mode #136

Closed wonder-sk closed 1 year ago

wonder-sk commented 1 year ago

If I execute pipeline in streaming mode, the schema dictionary ends up empty. When executed in standard mode, it is properly populated.

>>> import pdal
>>> r=pdal.Reader('merged.laz')
>>> p=pdal.Pipeline([r])
>>> p.execute_streaming()
6116033
>>> p.schema
{'schema': {}}
>>> p.execute()
6116033
>>> p.schema
{'schema': {'dimensions': [{'name': 'X', 'size': 8, 'type': 'floating'}, {'name': 'Y', 'size': 8, 'type': 'floating'}, {'name': 'Z', 'size': 8, 'type': 'floating'}, {'name': 'Intensity', 'size': 2, 'type': 'unsigned'}, {'name': 'ReturnNumber', 'size': 1, 'type': 'unsigned'}, {'name': 'NumberOfReturns', 'size': 1, 'type': 'unsigned'}, {'name': 'ScanDirectionFlag', 'size': 1, 'type': 'unsigned'}, {'name': 'EdgeOfFlightLine', 'size': 1, 'type': 'unsigned'}, {'name': 'Classification', 'size': 1, 'type': 'unsigned'}, {'name': 'ScanAngleRank', 'size': 4, 'type': 'floating'}, {'name': 'UserData', 'size': 1, 'type': 'unsigned'}, {'name': 'PointSourceId', 'size': 2, 'type': 'unsigned'}, {'name': 'GpsTime', 'size': 8, 'type': 'floating'}, {'name': 'Red', 'size': 2, 'type': 'unsigned'}, {'name': 'Green', 'size': 2, 'type': 'unsigned'}, {'name': 'Blue', 'size': 2, 'type': 'unsigned'}]}}
hobu commented 1 year ago

I think this is a bug. If I use a pipeline with additional stages, the .schema member is populated correctly.

hobu commented 1 year ago

The reason this is not working is because the PipelineExecutor maintains its own streaming PointTable that cannot provide its metadata during the .metadata() call. Only the standard mode PointTable is available for that.

I think the PipelineExecutor should be split in two – one for streaming and one for standard mode.

wonder-sk commented 1 year ago

Not sure if this is an instance of the same problem or a separate issue:

import pdal
p = pdal.Reader("data.laz") | pdal.Filter.hexbin()
for x in p.iterator(chunk_size=100000):
   print("hello")
print(p.metadata)

The last line gives me RuntimeError: Pipeline has not been executed!

hobu commented 1 year ago

definitely related and in the python bindings usage of streaming vs notstreaming

gsakkis commented 1 year ago

PipelineIterator has extra properties, including metadata; there's a test for it. So the example above should be (untested):

import pdal
p = pdal.Reader("data.laz") | pdal.Filter.hexbin()
it = p.iterator(chunk_size=100000)
for x in it:
   print("hello")
print(it.metadata)
hobu commented 1 year ago

ok, not a bug and not going to do anything about it. If you're in streaming mode, you need to get your metadata from the iterator, not the pipeline.