JuliaData / DBFTables.jl

Read and write DBF (dBase) tabular data in Julia
Other
10 stars 11 forks source link

Improve column read speed by removing type instability #33

Closed asinghvi17 closed 8 months ago

asinghvi17 commented 8 months ago

This improves the speed of a column re-materialization which has to be done using Tables.jl from 160 ms to 10 ms.

Benchmark:

using Shapefile, GeometryOps, DataFrames, BenchmarkTools

shp_file = "/Users/anshul/Downloads/ne_10m_admin_0_countries (1)/ne_10m_admin_0_countries.shp"
table = Shapefile.Table(shp_file)
df = DataFrame(table) # convert everything including the DBFTable to a DataFrame

_scale_by_5(x) = x .* 5
@benchmark GeometryOps.transform($_scale_by_5, $table) # 160 ms before, 13 ms after
@benchmark GeometryOps.transform($_scale_by_5, $df)    # 7 ms
rafaqz commented 8 months ago

@asinghvi17 maybe post your benchmarks here for reviewers? I cant actually merge this.

@visr @joshday this is an order of magnitude improvement on column read speed, lets get it merged and bumped