Open alamb opened 3 years ago
Comment from Andrew Lamb(alamb) @ 2021-04-26T12:32:40.877+0000:
Migrated to github: https://github.com/apache/arrow-rs/issues/89
Hi, I'd like to explore this ticket, but I wonder how and where the benchmark should be run, and also what test workload each operator should be running against?
Thanks @OscarTHZhang
I think part of this ticket would be to define a reasonable test workload
Here are some examples of benches that might server as inspiration:
SortPreservingMerge
: https://github.com/apache/arrow-datafusion/blob/master/datafusion/core/benches/merge.rsMaybe the first thing to do is to take stock of the current coverage and propose some additions?
Hi @alamb,
Here are some questions up on my mind
I think we can divide the micro-bench into 2 types (as described above)
For all the aggregations, if we are going to implement them all, we can simply write the targeted SQL benchmarks. For operators that operates on column- and table-granularity with output sill as columns and tables, set up the single operator bench for them, such as merge, join, filter.
How does this sound? Anything missing?
Hi @OscarTHZhang Thanks for commenting on this ticket.
I think we can divide the micro-bench into 2 types (as described above)
I think the core goal for the ticket is to ensure the vast majority of the time is spent doing the operation rather than reading data.
It might make sense to go through existing benchmarks and try to see what coverage we already have
End to end benchmarks: https://github.com/apache/arrow-datafusion/tree/master/benchmarks
more micro level benchmarks: https://github.com/apache/arrow-datafusion/tree/master/datafusion/core/benches
There are already some benchmarks that appear to be Targeted SQL that you describe, for example https://github.com/apache/arrow-datafusion/blob/master/datafusion/core/benches/sql_planner.rs and https://github.com/apache/arrow-datafusion/blob/master/datafusion/core/benches/aggregate_query_sql.rs
There are also some benchmarks for operators that are used as part of other operations, such as https://github.com/apache/arrow-datafusion/blob/master/datafusion/core/benches/merge.rs
Not sure how strong the suggestion of using Criterion was, but I recently discovered Divan. It may be worth evaluating.
(I have no affiliation; am just an aspiring OSS contributor browsing the good-first-issues 🙈)
https://github.com/bheisler/iai could be a good fit for benchmarking those ExecutionPlan
implementations that do little or no I/O. It reports not durations of wall time, but rather exact counts or estimates of low-level metrics:
bench_fibonacci_short
Instructions: 1735
L1 Accesses: 2364
L2 Accesses: 1
RAM Accesses: 1
Estimated Cycles: 2404
I’m not sure if there are any caveats around using it to measure async-style Rust code, though.
Hi @OscarTHZhang Thanks for commenting on this ticket.
I think we can divide the micro-bench into 2 types (as described above)
I think the core goal for the ticket is to ensure the vast majority of the time is spent doing the operation rather than reading data.
It might make sense to go through existing benchmarks and try to see what coverage we already have
End to end benchmarks:
master
/benchmarksmore micro level benchmarks:
master
/datafusion/core/benchesThere are already some benchmarks that appear to be Targeted SQL that you describe, for example
master
/datafusion/core/benches/sql_planner.rs andmaster
/datafusion/core/benches/aggregate_query_sql.rsThere are also some benchmarks for operators that are used as part of other operations, such as
master
/datafusion/core/benches/merge.rs
@alamb the way this issue title is phrased, it seems the right way to address is to extend the benchmarks which you shared here as micro-benchmarks. master/datafusion/core/benches
is that correct?
Note: migrated from original JIRA: https://issues.apache.org/jira/browse/ARROW-9551
We should implement criterion microbenchmarks for each operator so that we can test the impact of code changes on performance and catch regressions.