geoarrow / geoarrow-rs

GeoArrow in Rust, Python, and JavaScript (WebAssembly) with vectorized geometry operations
http://geoarrow.org/geoarrow-rs/
Apache License 2.0
249 stars 16 forks source link

Use geo-traits for parsing WKT #791

Closed kylebarron closed 5 days ago

kylebarron commented 5 days ago

Change list

Testing from Python, it looks like this may be minimally faster than the previous streaming implementation:

import pyarrow as pa
import geopandas as gpd
import shapely
from geoarrow.rust.core import from_wkt

path = "/Users/kyle/Downloads/nz-building-outlines.parquet"
gdf = gpd.read_parquet(path)
wkt_arr = shapely.to_wkt(gdf.geometry.array)
pa_wkt_arr = pa.array(wkt_arr[:100_000])
%timeit _ = from_wkt(pa_wkt_arr)

Main branch:

242 ms ± 1.07 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

This branch:

234 ms ± 4.68 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Even if it's not particularly faster, it's preferable because it reuses the geo-traits conversion mechanism, which is how most data conversion is/should be handled to geoarrow arrays.

For reference, Shapely/GEOS is still 18% faster here:

%timeit shapely.from_wkt(pa_wkt_arr)
192 ms ± 9.45 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

The overhead of the wkt crate is probably in parsing to an intermediate representation?