Closed danakj closed 1 year ago
Ok I think we need to do this. Here's the results of the chunks_exact benchmark before and after making take_while() constexpr by working with a FnMut concept instead of with FnMutBox.
On clang 17.
Before:
| ns/op | op/s | err% | total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
| 456.20 | 2,192,023.89 | 0.1% | 0.24 | `common_prefix_naive`
| 2,026.82 | 493,383.67 | 0.1% | 0.11 | `common_prefix_zip`
| 2,834.25 | 352,827.42 | 0.0% | 0.15 | `common_prefix_chunks_exact`
| 2,566.90 | 389,574.32 | 0.2% | 0.14 | `common_prefix_no_shortcircuit`
| 1,635.27 | 611,518.43 | 0.1% | 0.09 | `common_prefix_take_while`
After:
| ns/op | op/s | err% | total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
| 258.79 | 3,864,091.89 | 0.0% | 0.13 | `common_prefix_naive`
| 1,871.31 | 534,385.99 | 0.0% | 0.10 | `common_prefix_zip`
| 1,319.52 | 757,849.37 | 0.0% | 0.07 | `common_prefix_chunks_exact`
| 1,175.77 | 850,506.34 | 0.0% | 0.06 | `common_prefix_no_shortcircuit`
| 246.81 | 4,051,774.51 | 1.2% | 0.01 | `common_prefix_take_while`
We get something faster than just a for loop in the end with the change.
It's painful to have to heap allocate to use capturing lambdas.
If each Iterator method that returns an Iterator is marked with lifetime_bound, will Clang warn if we construct a lambda with a reference in it, put it into an Iterator, chain it into another Iterator, and leave the scope?
Is there.. something else we could do here? A clang-tidy?
See TODOs in max_by_key/min_by_key, they have to write more complex logic instead of
.map(to_tuple).max_by(cmp_tuple).map(from_tuple)
.