chromium / subspace

A concept-centered standard library for C++20, enabling safer and more reliable products and a more modern feel for C++ code.; Also home of Subdoc the code-documentation generator.
https://suslib.cc
Apache License 2.0
89 stars 15 forks source link

Consider dropping FnMutBoxs from Iterators #272

Closed danakj closed 1 year ago

danakj commented 1 year ago

It's painful to have to heap allocate to use capturing lambdas.

If each Iterator method that returns an Iterator is marked with lifetime_bound, will Clang warn if we construct a lambda with a reference in it, put it into an Iterator, chain it into another Iterator, and leave the scope?

Is there.. something else we could do here? A clang-tidy?

See TODOs in max_by_key/min_by_key, they have to write more complex logic instead of .map(to_tuple).max_by(cmp_tuple).map(from_tuple).

danakj commented 1 year ago

Ok I think we need to do this. Here's the results of the chunks_exact benchmark before and after making take_while() constexpr by working with a FnMut concept instead of with FnMutBox.

On clang 17.

Before:

|               ns/op |                op/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|              456.20 |        2,192,023.89 |    0.1% |      0.24 | `common_prefix_naive`
|            2,026.82 |          493,383.67 |    0.1% |      0.11 | `common_prefix_zip`
|            2,834.25 |          352,827.42 |    0.0% |      0.15 | `common_prefix_chunks_exact`
|            2,566.90 |          389,574.32 |    0.2% |      0.14 | `common_prefix_no_shortcircuit`
|            1,635.27 |          611,518.43 |    0.1% |      0.09 | `common_prefix_take_while`

After:

|               ns/op |                op/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|              258.79 |        3,864,091.89 |    0.0% |      0.13 | `common_prefix_naive`
|            1,871.31 |          534,385.99 |    0.0% |      0.10 | `common_prefix_zip`
|            1,319.52 |          757,849.37 |    0.0% |      0.07 | `common_prefix_chunks_exact`
|            1,175.77 |          850,506.34 |    0.0% |      0.06 | `common_prefix_no_shortcircuit`
|              246.81 |        4,051,774.51 |    1.2% |      0.01 | `common_prefix_take_while`

We get something faster than just a for loop in the end with the change.