geoarrow / geoarrow-python

Python implementation of the GeoArrow specification
http://geoarrow.org/geoarrow-python/
Apache License 2.0
59 stars 3 forks source link

perf: Optimize pandas `GeoArrowExtensionArray.copy()` #30

Closed paleolimbot closed 11 months ago

paleolimbot commented 11 months ago

Credit to @martinfleis for the epic traceback that highlighted this and @jorisvandenbossche for the concat_arrays() trick!

import geoarrow.pandas as gapd
import geoarrow.pyarrow as ga
import numpy as np

pts_array = ga.point().from_geobuffers(
    None, np.random.random(int(1e6)), np.random.random(int(1e6))
)

pts_pd = gapd.GeoArrowExtensionArray(pts_array)

%timeit pts_pd.copy()
#> Before this PR:
#> 37.8 s ± 188 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
#> After this PR:
#> 630 µs ± 22.4 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
codecov[bot] commented 11 months ago

Codecov Report

Merging #30 (1508a79) into main (1aa84b9) will increase coverage by 0.03%. The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main      #30      +/-   ##
==========================================
+ Coverage   94.56%   94.59%   +0.03%     
==========================================
  Files          10       10              
  Lines        1250     1257       +7     
==========================================
+ Hits         1182     1189       +7     
  Misses         68       68              
Files Coverage Δ
geoarrow-pandas/src/geoarrow/pandas/lib.py 93.65% <100.00%> (+0.14%) :arrow_up:
kylebarron commented 11 months ago
%timeit pts_pd.copy()
#> Before this PR:
#> 37.8 s ± 188 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
#> After this PR:
#> 630 µs ± 22.4 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

that's a good speedup 😉