[Rust] Implement micro benchmarks for each operator

alamb commented 3 years ago

Note: migrated from original JIRA: https://issues.apache.org/jira/browse/ARROW-9551

We should implement criterion microbenchmarks for each operator so that we can test the impact of code changes on performance and catch regressions.

alamb commented 3 years ago

Comment from Andrew Lamb(alamb) @ 2021-04-26T12:32:40.877+0000:

Migrated to github: https://github.com/apache/arrow-rs/issues/89

OscarTHZhang commented 2 years ago

Hi, I'd like to explore this ticket, but I wonder how and where the benchmark should be run, and also what test workload each operator should be running against?

alamb commented 2 years ago

Thanks @OscarTHZhang

I think part of this ticket would be to define a reasonable test workload

Here are some examples of benches that might server as inspiration:

Single Operator: SortPreservingMerge: https://github.com/apache/arrow-datafusion/blob/master/datafusion/core/benches/merge.rs
"Targeted" SQL (aggregates): https://github.com/apache/arrow-datafusion/blob/master/datafusion/core/benches/aggregate_query_sql.rs
"Targeted" SQL (filter): https://github.com/apache/arrow-datafusion/blob/master/datafusion/core/benches/filter_query_sql.rs

Maybe the first thing to do is to take stock of the current coverage and propose some additions?

OscarTHZhang commented 2 years ago

Hi @alamb,

Here are some questions up on my mind

At what granularity a benchmark should be operating at?
For aggregate, for example, do we also need to implement the micro benchmarks for all the aggregate functions and also all the physical aggregate expressions (like correlation)?

I think we can divide the micro-bench into 2 types (as described above)

Single Operator bench
Targeted SQL

For all the aggregations, if we are going to implement them all, we can simply write the targeted SQL benchmarks. For operators that operates on column- and table-granularity with output sill as columns and tables, set up the single operator bench for them, such as merge, join, filter.

How does this sound? Anything missing?

alamb commented 2 years ago

Hi @OscarTHZhang Thanks for commenting on this ticket.

I think we can divide the micro-bench into 2 types (as described above)

I think the core goal for the ticket is to ensure the vast majority of the time is spent doing the operation rather than reading data.

It might make sense to go through existing benchmarks and try to see what coverage we already have

End to end benchmarks: https://github.com/apache/arrow-datafusion/tree/master/benchmarks

more micro level benchmarks: https://github.com/apache/arrow-datafusion/tree/master/datafusion/core/benches

There are already some benchmarks that appear to be Targeted SQL that you describe, for example https://github.com/apache/arrow-datafusion/blob/master/datafusion/core/benches/sql_planner.rs and https://github.com/apache/arrow-datafusion/blob/master/datafusion/core/benches/aggregate_query_sql.rs

There are also some benchmarks for operators that are used as part of other operations, such as https://github.com/apache/arrow-datafusion/blob/master/datafusion/core/benches/merge.rs

spencerwilson commented 11 months ago

Not sure how strong the suggestion of using Criterion was, but I recently discovered Divan. It may be worth evaluating.

https://nikolaivazquez.com/blog/divan/
https://github.com/nvzqz/divan
author's announcement on r/rust: https://www.reddit.com/r/rust/comments/1703xwe/announcing_divan_fast_and_simple_benchmarking_for/

(I have no affiliation; am just an aspiring OSS contributor browsing the good-first-issues 🙈)

spencerwilson commented 11 months ago

https://github.com/bheisler/iai could be a good fit for benchmarking those ExecutionPlan implementations that do little or no I/O. It reports not durations of wall time, but rather exact counts or estimates of low-level metrics:

bench_fibonacci_short
  Instructions:                1735
  L1 Accesses:                 2364
  L2 Accesses:                    1
  RAM Accesses:                   1
  Estimated Cycles:            2404

I’m not sure if there are any caveats around using it to measure async-style Rust code, though.

edmondop commented 10 months ago

Hi @OscarTHZhang Thanks for commenting on this ticket.

I think we can divide the micro-bench into 2 types (as described above)

I think the core goal for the ticket is to ensure the vast majority of the time is spent doing the operation rather than reading data.

It might make sense to go through existing benchmarks and try to see what coverage we already have

End to end benchmarks: master/benchmarks

more micro level benchmarks: master/datafusion/core/benches

There are already some benchmarks that appear to be Targeted SQL that you describe, for example master/datafusion/core/benches/sql_planner.rs and master/datafusion/core/benches/aggregate_query_sql.rs

There are also some benchmarks for operators that are used as part of other operations, such as master/datafusion/core/benches/merge.rs

@alamb the way this issue title is phrased, it seems the right way to address is to extend the benchmarks which you shared here as micro-benchmarks. master/datafusion/core/benches

is that correct?

apache / datafusion

[Rust] Implement micro benchmarks for each operator #94