camsaul / toucan2

Successor library to Toucan with a modern and more-extensible API, more consistent behavior, and support for different backends including non-JDBC databases and non-HoneySQL queries. Currently in active beta.
Eclipse Public License 1.0
81 stars 11 forks source link

Fix an N^2 in batch hydration #172

Open bshepherdson opened 3 months ago

bshepherdson commented 3 months ago

This hadn't been noticed previously since N is generally small, eg. 50 or 60 at most. Working with a Metabase dashboard that loaded 10000 fields (100 each from 100 tables) in a single select + hydrate, I found this was consuming about 3 seconds.

The slow path was effectively a very slow conj: taking the acc vector, and then for each annotated-instances doing:

(recur (vec (concat acc nil [(first hydrated-instances)]))
       ...)

which is pouring the whole vector into a lazy sequence and back into a vector, once for each instance in the list.

The new version uses a custom transducer to do the same process in O(n) time (and without any seq overhead, as a bonus).

codecov[bot] commented 2 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 83.58%. Comparing base (3729d20) to head (4748350).

Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #172 +/- ## ======================================= Coverage 83.58% 83.58% ======================================= Files 37 37 Lines 2498 2498 Branches 212 212 ======================================= Hits 2088 2088 Misses 198 198 Partials 212 212 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

alexander-yakushev commented 2 months ago

Bump. I've rewritten the implementation of merging original instances with batch-hydrated instances again. I'm building an index that serves both as the indicator which instances should be taken into the batch to be hydrated and which not; and also it enumerates the instances-to-by-hydrated, so that on the second pass of this index, we can extract the hydrated instances by their position in the batch.

The tests are now fixed.