Reading metadata without loding the point cloud into memory

digital-idiot commented 3 years ago

Is there a way to programmatically read the metadata without loading the point cloud into memory? The documentation isn't very clear on this. Basically I found two ways to do this:

Read the point cloud then call get_metadata from the Pipeline object
Execute pdal info --metadata <file_path> as system command and capture the output

In the first case entire point cloud is loaded into the memory which is obviously unnecessary. In second approach seems a bit hacky. For example, in GDAL it is not necessary to execute gdal info from the program just to get the metadata, all the metadata are accessible by just opening the file without loading actual data.

abellgithub commented 3 years ago

Sorry, this is not currently supported in the Python interface.

You can certainly run pdal info --metadata and parse the returned JSON for yourself.

guillochon commented 2 years ago

In case anyone finds this useful, here's a code snippet that will load the metadata from that command. The replace is done because double quotes are double escaped from check_output and that screws up json.loads:

from subprocess import check_output
txt = check_output(f"pdal info {path} --metadata", shell=True).replace(
    b'\\\\"', b'\\"'
)
metadata = json.loads(txt)["metadata"]

It would be cool to add this to the Python interface directly...

abellgithub commented 2 years ago

I forgot earlier that I think you can get this information from executing a pipeline containing only a reader and setting the --count option to 0.

PDAL / python

Reading metadata without loding the point cloud into memory #92