Closed kojix2 closed 5 years ago
Use Arrow::Table#raw_records
.
t.select_columns(:chr, :pos, :pval).raw_records
I tried the benchmark again.
Benchmark.bm 12 do |r|
r.report "RedArrow" do
t = Arrow::Table.load("uc.csv")
t.select_columns(:chr, :pos, :pval).raw_records
# puts [chr, pos, pval] == correct_array
end
user system total real
RedArrow 11.680318 2.015998 13.696316 ( 1.622647)
FastestCSV 18.993417 0.357121 19.350538 ( 19.361992)
CSV 112.687737 0.262550 112.950287 (112.994485)
Now Red Arrow is faster than fastest-csv. The result is very impressive.
RedArrow doesn't consume much memory. Any column can be converted to a Ruby array when needed. It'll be definitely useful.
Thank you very much!
Oh, I have to add transpose
. It looks a bit messy.
chr, pos, pval = t.select_columns(:chr, :pos, :pval).raw_records.transpose
This does not affect performance, though.
user system total real
RedArrow 12.714012 3.208401 15.922413 ( 2.861562)
FastestCSV 18.009612 0.315168 18.324780 ( 18.335624)
CSV 110.762741 0.387075 111.149816 (111.192139)
Hello.
I have a question about RedArrow (Ruby Binding).
My goal is to read the CSV file with RedArrow and create a Ruby array from the columns. But it doesn't get as fast as I thought.
Here is my benchmark.
uc.csv
is a 1GB csv file. (I downloaded a tsv file from International Inflammatory Bowel Disease Genetics Consortium, renamed some columns, and converted them to csv.)Result
Fastest-csv is the fastest, and RedArrow is the slowest.
I am not familiar with RedArrow. So I may have made an elementary mistake. Any suggestions are welcome. Thank you.