JuliaData / IndexedTables.jl

Flexible tables with ordered indices
https://juliadb.org
MIT License
121 stars 37 forks source link

terrible compilation overhead #81

Open shashi opened 7 years ago

shashi commented 7 years ago
julia> using JuliaDB

julia> @time x=TextParse.readcsv("fail.csv"); # CSV parsing is not the problem
  0.626648 seconds (365.80 k allocations: 20.111 MiB, 9.01% gc time)

julia> @time x=load_table("fail.csv"); # This constructs an indexed table with 314 columns as values 1:n as key
460.840912 seconds (166.24 M allocations: 9.823 GiB, 0.30% gc time)

julia> @time x[1];
322.713544 seconds (113.49 M allocations: 6.647 GiB, 0.28% gc time)

julia> @time x[1];
  0.000136 seconds (318 allocations: 12.984 KiB)

Offending file: http://www.sharecsv.com/s/fdae928a89db09db887faa55273d20ce/fail.csv

bkamins commented 6 years ago

I wanted just to add that on Julia 0.6.2 if you add another 1000 columns to the source file you get (we have simillar issues in https://github.com/JuliaData/DataFrames.jl/issues/1335#issuecomment-355136747):

julia> @time x=loadndsparse("fail.csv");
Metadata for 0 / 1 files can be loaded from cache.
Reading 1 csv files totalling 26.183 KiB in 1 batches...
ERROR: StackOverflowError:

CC @nalimilan