Closed spencerkent closed 5 months ago
Interesting. ~100x slower is certainly surprising. I think the {:list, :string}
dtype needs a fair bit of tricky memory allocation which might account for the difference, but I wouldn't expect it to be that stark.
By chance, are you able to test this same thing in Polars? Python Polars would be easiest.
@billylanchantin we had some past discussions about this. The main issue I think is that we always traverse the given data, even if a dtype is provided, and we should have a mechanism to say: "this is the dtype, don't try to check or cast it", and we send the data as is, without inferring or casting. The big question is: which one do we want as default? To cast or not to cast? We can probably do this in time for the upcoming v0.9, it is not a big change.
The improvements from #863 should also help here.
@josevalim Ah gotcha, thanks. Yeah I see #863 has the todo:
- [ ] Remove the type checking Elixir-side
When you say "it is not a big change", do you mean finishing #863 isn't big? Or adding a separate mechanism to skip the type check isn't big?
I am sure how hard the PR is, I meant the removing casting the casting and making the inference on Elixir side optional.
I pushed a commit that removed the casting. This commit removes the casting: https://github.com/elixir-explorer/explorer/commit/c5c2eb7b215be0d33ef6e28fbc06c02209c75ccb
And this PR removes the inference: https://github.com/elixir-explorer/explorer/pull/923 - @spencerkent, can you please try the PR and let us know how it impacts the numbers? Altogether, those changes should reduce at least two traversals from the Elixir side.
Hi all thanks for the quick followup! I was out of commission this weekend but had a chance to test the PR #923 this morning, and it looks like it does make a large improvement:
Operating System: Linux
CPU Information: Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
Number of Available Cores: 16
Available memory: 19.53 GB
Elixir 1.16.0
Erlang 26.2.1
JIT enabled: true
Benchmark suite executing with the following configuration:
warmup: 2 s
time: 1 min
memory time: 2 s
reduction time: 0 ns
parallel: 1
inputs: none specified
Estimated total run time: 2 min 8 s
Benchmarking no_list_column ...
Benchmarking with_list_column ...
Calculating statistics...
Formatting results...
Name ips average deviation median 99th %
no_list_column 366.50 2.73 ms ±54.29% 2.42 ms 7.29 ms
with_list_column 60.24 16.60 ms ±30.71% 15.37 ms 28.83 ms
Comparison:
no_list_column 366.50
with_list_column 60.24 - 6.08x slower +13.87 ms
Memory usage statistics:
Name Memory usage
no_list_column 449.64 KB
with_list_column 558.02 KB - 1.24x memory usage +108.38 KB
**All measurements for memory usage were the same**
So, still slower, but much better :)
Hi @spencerkent, feel free to try main, we merged that PR and a couple other improvements by @philss. We have one more down the pipeline. :)
Closing this based on the latest PRs. Maybe there are some specific lists optimizations we could look into but there should be a sizable improvement right now. Thank you!
I'm noticing that creation of a new DataFrame that has a list column is veeeerrry sloooow (in Explorer 0.8.2), so thought I'd show a little benchmark and see if there's any ideas how to make it faster :)
Starting with two objects that support
Table.Reader
, one of them has no list columns, the other does (synthetic data):These objects are essentially the same size:
The comparison:
Output from Benchee: