PDAL / python

PDAL's Python Support
Other
115 stars 34 forks source link

Add GeoDataFrame support to Pipeline #173

Open jf-geo opened 2 months ago

jf-geo commented 2 months ago

GeoDataFrame support added for if GeoPandas is available.

GeoPandas import mirrors existing Pandas import.

Pipeline.get_geodataframe method added. Users can get an array from an executed pipeline as a GeoDataFrame instead of a Pandas DataFrame. Optional arguments included for specifying XY vs XYZ point geometries and for providing CRS information to the GeoDataFrame constructor.

Pipeline(dataframes) modified so that columns named "geometry" will be dropped before conversion to structured arrays.

hobu commented 2 months ago

Can you please add a test for this?

I'm not quite familiar with the performance consequences of GeoDataFrame relative to our StructuredArray. Does this PR work as expected for you?

jf-geo commented 2 months ago

Can you please add a test for this?

I'm not quite familiar with the performance consequences of GeoDataFrame relative to our StructuredArray. Does this PR work as expected for you?

Test added.

The PR works as expected.

Using GeoDataFrames as an input for pipelines performs the same as using Pandas DataFrames.

Getting a GeoDataFrame from an executed pipeline is slower than getting a StructuredArray or DataFrame as a GeoSeries needs to be created from the XY[Z] dimensions. Currently the fastest way of doing this is via geopandas.points_from_xy. points_from_xy performance depends on the user's version of shapely or if they have pygeos installed.