Open rvignolo-julius opened 2 years ago
Hi,
I have approximately 6.5M rows in a database. When loading by chunks, I noticed the following:
using CSV using DataFrames using BenchmarkTools @btime CSV.read("data.csv", DataFrames.DataFrame; skipto= 2, limit=1_000_00, header=true, ntasks=1); # 77.412 ms (1661 allocations: 27.48 MiB) @btime CSV.read("data.csv", DataFrames.DataFrame; skipto=1_000_002, limit=1_000_00, header=true, ntasks=1); # 13.741 s (165783320 allocations: 2.56 GiB) @btime CSV.read("data.csv", DataFrames.DataFrame; skipto=2_000_002, limit=1_000_00, header=true, ntasks=1); # 27.617 s (333119407 allocations: 5.11 GiB) @btime CSV.read("data.csv", DataFrames.DataFrame; skipto=3_000_002, limit=1_000_00, header=true, ntasks=1); # 41.784 s (500520198 allocations: 7.66 GiB) @btime CSV.read("data.csv", DataFrames.DataFrame; skipto=4_000_002, limit=1_000_00, header=true, ntasks=1); # 56.294 s (667717786 allocations: 10.22 GiB) @btime CSV.read("data.csv", DataFrames.DataFrame; skipto=5_000_002, limit=1_000_00, header=true, ntasks=1); # 69.886 s (835307531 allocations: 12.77 GiB) @btime CSV.read("data.csv", DataFrames.DataFrame; skipto=6_000_002, limit=1_000_00, header=true, ntasks=1); # 83.741 s (1002749189 allocations: 15.33 GiB)
The time grows unexpectedly too much. Is there any other approach I could take? Are there unwanted allocations?
Thank you for the amazing work!
Hi, any ideas regarding what could be happening? Thanks!
Hi,
I have approximately 6.5M rows in a database. When loading by chunks, I noticed the following:
The time grows unexpectedly too much. Is there any other approach I could take? Are there unwanted allocations?
Thank you for the amazing work!