[pipeline] Use conj! as the default pipeline rf

camsaul / toucan2

Successor library to Toucan with a modern and more-extensible API, more consistent behavior, and support for different backends including non-JDBC databases and non-HoneySQL queries. Currently in active beta.

Eclipse Public License 1.0

81 stars 11 forks source link

[pipeline] Use conj! as the default pipeline rf #191

Open alexander-yakushev opened 1 month ago

alexander-yakushev commented 1 month ago

Another small but non-invasive improvement. This works best when many rows are processed, but shouldn't bring too much of an overhead if one/few rows are selected.

camsaul commented 1 month ago

Not sure this is really "non-invasive" if everything using query-reducible and select-reducible and the like has to be updated to use transduce instead of reduce (as evidenced by the tests you had to update)... I'm fairly certain this is going to require us to make changes in upstream Metabase code which makes it a breaking change. I'm honestly not sure the minor (?) performance benefits we get here are worth making the code harder to use correctly. (What are the performance benefits, btw? I would still love to see some benchmarks for these PRs)

alexander-yakushev commented 1 month ago

I understand what you are saying. However, the tests that I've changed, for example, extend the method pipeline/transduce-execute-with-connection. I'm not sure there is a hard rule regarding this, but I would expect anything with the word "transduce" in the name to call the 1-arity of the reducing function. Otherwise, it's violating the "transduce contract," so to speak.

EDIT: this may be relevant https://clojure.org/reference/transducers#_creating_transducible_processes.

A completing process must call the completion operation on the final accumulated value exactly once.

It is possible to rewrite this PR without having to resort to 1-arity if the compatibility here is crucial.

Regarding the benchmarks: I did benchmarks for multiple changes at once. Each change is not very impressive number-wise on its own, but together they chip away quite a bit. I'll do separate benchmarks for this PR and post them.

codecov[bot] commented 1 month ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 83.64%. Comparing base (3729d20) to head (dd09a66).

Additional details and impacted files

```diff @@ Coverage Diff @@ ## master #191 +/- ## ========================================== + Coverage 83.58% 83.64% +0.05% ========================================== Files 37 37 Lines 2498 2507 +9 Branches 212 212 ========================================== + Hits 2088 2097 +9 Misses 198 198 Partials 212 212 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

alexander-yakushev commented 1 month ago

I did benchmarks of this against the current master and the allocations are 20% lower but timing results are inconclusive. Let's wait with this PR if favor of the others, and revisit it once other are merged.