Closed catevita closed 5 years ago
ptal @feuerste @nnmm
ptal @feuerste @nnmm
a number of changes of the pr in #219 got lost in the merge to master hence i'd leave for comparison the changes against the branch and then switch after the review to master
The main question I have is whether we want an mpsc of points vs an mpsc of batches. This PR seems to be targetting an mpsc of batches, and I'd like to understand better why, since the decision to go with the batches options seems to be making this change a bit complex. Is it for performance reasons?
yes, I'd say so. every time you want to read a single point a piece of file is gonna be cached and you get the rest of the batch nearly for free. the batch iterator has been now implemented since a couple of prs. which complexity are you thinking of? This PR actually decouples the evaluation of mutable fuctions from the reading of the points, as mutable function do not generally go along with threads.
I think you're right that it makes sense to have an mpsc/channel of batches. It seems possible however to keep the BatchIterator and PointStream like before, but just make the callback in the PointStream a closure that sends the batch over the channel back to the main thread. The actual callback could then be executed with try_for_each
on rx.iter()
like in this demo (except with batches instead of points, so you wouldn't call push_point_and_callback()
there, just the user callback). That seems a lot simpler, but if I'm missing anything, let me know :)
in theory yes, it would work, but it allows misuse of the batch iterator. that's why I want to scrap passing the mut function completely.
splitted batch iterator functionality into
this is the pre step that allows finally for a parallelization of the query