Depend on wkt directly (note that this means we now have two copies of wkt in our dependency tree, since geos apparently depends on wkt 0.10.3.
Implement geo-traits traits on wkt types objects.
Initial start of new ParseWKT trait that parses a WKT string array without using our streaming builder. I think it'll be better in the long run to minimize the use of the streaming builder.
Testing from Python, it looks like this may be minimally faster than the previous streaming implementation:
import pyarrow as pa
import geopandas as gpd
import shapely
from geoarrow.rust.core import from_wkt
path = "/Users/kyle/Downloads/nz-building-outlines.parquet"
gdf = gpd.read_parquet(path)
wkt_arr = shapely.to_wkt(gdf.geometry.array)
pa_wkt_arr = pa.array(wkt_arr[:100_000])
%timeit _ = from_wkt(pa_wkt_arr)
Main branch:
242 ms ± 1.07 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
This branch:
234 ms ± 4.68 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Even if it's not particularly faster, it's preferable because it reuses the geo-traits conversion mechanism, which is how most data conversion is/should be handled to geoarrow arrays.
For reference, Shapely/GEOS is still 18% faster here:
%timeit shapely.from_wkt(pa_wkt_arr)
192 ms ± 9.45 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
The overhead of the wkt crate is probably in parsing to an intermediate representation?
Change list
wkt
directly (note that this means we now have two copies ofwkt
in our dependency tree, sincegeos
apparently depends onwkt 0.10.3
.geo-traits
traits onwkt
types objects.ParseWKT
trait that parses a WKT string array without using our streaming builder. I think it'll be better in the long run to minimize the use of the streaming builder.Testing from Python, it looks like this may be minimally faster than the previous streaming implementation:
Main branch:
This branch:
Even if it's not particularly faster, it's preferable because it reuses the geo-traits conversion mechanism, which is how most data conversion is/should be handled to geoarrow arrays.
For reference, Shapely/GEOS is still 18% faster here:
The overhead of the
wkt
crate is probably in parsing to an intermediate representation?