Closed jeswr closed 2 years ago
@RubenVerborgh When I run it locally node 10 coveralls check is complaining about a comment in line 38 - so not sure how to resolve that one...
Rather than close
, how about we call the passed map
function with (item, iterator)
as arguments?
Then limit
can close the iterator itself, and we don't need an exception for it.
This is ready for review @RubenVerborgh - given that this is now much cleaner than #58 I suggest that we merge the branch more-or-less in this state and worry about possible performance tweaks in the order of 10-20% at a later date.
Excellent. Thanks, @jeswr; I can work with this!
@jeswr Thanks for your patience. Gave this a deep review.
This is almost good to go, except the readable
getter, which uses a trick that will get us into trouble.
The readable
mechanism is quite intricate, and is also used to fire the readable
event at the correct times.
So we really need something that sets this.readable = true
when the source becomes readable, similar to how TransformIterator
has source.on('readable', destinationFillBuffer)
.
As a fun challenge (and looping in @joachimvh), I gave up on the typing of private readonly _mappings: MapFunction<any, any, MappingIterator<S, D>>[];
which should really be private readonly _mappings: MapChain<S, D>
where MapChain<S, D>
is recursively defined as [MapFunction<S, D>] | [MapFunction<S, X>, ...MapChain<X, D>]
with some infer X
magic. However, I couldn't do it. Not a necessity though 🙂
Sorry for the haphazard comments, @jeswr! They may look random, but it's actually three separate waves of working on this, and some crucial thinking work in between.
However, this is the conclusion I've come to:
.map
optimization.
readable
right.iterator
argument in the map
callback. When you compress multiple mappings into one, this argument becomes meaningless/powerless: calling .end
on these intermediate iterators becomes hard to define.iterator
argument in the map
function, as this excludes certain optimizations in the future.
null
now means "don't emit this item". I don't think that's a great solution though in general, to have such special return values. Or we could throw a special kind of error. But still not my favorite.take
iterator; I propose we just inline the solution of #58 (but not have a separate HeadIterator
, that seems overkill).I know some of the above go against my earlier suggestions, but reviewing the resulting code in depth brought new insights.
I haven't fully context switched back to this issue so apologies if any of these q's/notes are obvious or obviously incorrect.
For reference here is the original state of the PR when it was opened which I will also refer to a few times below.
Firstly I just wanted to make a clarification that I suspect answers this comment. The source
is what you refer to as root
in this description. The mappingRoot
(previously called upstream
) would be the iterator constituting root.map(f1).map(f2)
if we are considering the iterator for root.map(f1).map(f2).map(f3)
as our 'current iterator'.
The original PR also did not contain constructor overloads so I'll need a little more time to understand what has been done there before I can comment on whether there is indeed a bug now.
This is almost good to go, except the readable getter, which uses a trick that will get us into trouble.
I'm still not convinced that this was a problem when I started this PR. The logic flow I was following is that when .readable
is set to true via a getter, and in turn set true on the source
(i.e. root
) as is done in my original version of the PR, the consequence of this is that the root
becomes readable and emits a readable
event. Then as a result of this line the iterator for root.map(f1)
would emit readable, and in turn root.map(f1).map(f2)
would emit readable and finally root.map(f1).map(f2).map(f3)
would emit readable.
This ensures that readable
events are fired when required, whilst also ensuring that an iterator is readable
if and only if (at least while neither iterator is closed) the source
is readable which is to be expected when doing synchronous transformations.
Since the setter
has since been removed from the original PR there will indeed be a bug around readable events.
As for the issue around end
ing, this was the reason for checking whether the state is CLOSED
or not for readability. This means that if the root
has end
ed then root.map(f1)
, root.map(f1).map(f2)
and root.map(f1).map(f2).map(f3)
can never be readable again - and also this line ensures that each of the downstream iterators have the end event called on them. Furthermore, similarly to the readable
case, if the iterator corresponding to root.map(f1).map(f2)
were end
ed then only it and root.map(f1).map(f2).map(f3)
would be ended.
We should not expose the iterator argument in the map function, as this excludes certain optimizations in the future
What kind of optimisations does this exclude? Importantly, can they be achieved if we just pass a close
callback rather than the full iterator (this was the set-up when the PR was originally opened). A close callback would also avoid the need for a custom take
iterator.
I haven't fully context switched back to this issue so apologies if any of these q's/notes are obvious or obviously incorrect.
Yeah, sorry that I had to leave this one for a while. I think it's the last thing I worked on before I caught covid, so I've never been able to complete my original review. Back on things now though!
The original PR also did not contain constructor overloads
The only thing this does is allowing people to pass in a single mapping function, so it not needing to be a one-element array. The array constructor is considered private, given that it is an internal optimization; so I essentially split the two cases (external versus internal construction).
I'm still not convinced that this was a problem when I started this PR. The logic flow I was following is that when
.readable
is set to true via a getter, and in turn set true on thesource
(i.e.root
) as is done in my original version of the PR, the consequence of this is that theroot
becomes readable and emits areadable
event.
Right.
I think what happened is that I started adjusting the unit tests a bit, and noticed weird failures. So the unit tests now are correct, but I believe they don't work with the original PR because of readable
events (some iterators were hanging).
My memory is that I fixed the readable
to the point the tests were running, but I knew there would be other problems, hence the comment I wrote. For instance, given my (this._state < CLOSED) &&
edit, I presume that readable
events were being fired at times where this was not possible. So even though my fix worked, I could think of other scenarios that would go wrong.
The main challenge with AsyncIterator is correctness; it's sometimes quite hard to reason about edge cases. All tests might work, and suddenly 1 out of a 100 times a 1 million stream will stall and it will be impossible to figure out why. So that's why I really suggest sticking to the existing readable
mechanism, that has all the state checks etc., and has all the corresponding order-based unit tests attached to it. Or, if overwriting the mechanism, then a whole bunch of new readable
tests will be necessary (and they will surface a couple of bugs).
Since the setter has since been removed from the original PR there will indeed be a bug around readable events.
Yes, but having the setter might give rise to a whole other range of problems. The readable
setter was actually supposed to be private, but TypeScript doesn't allow such an asymmetry. Messing with the state of other iterators can be tricky.
I think if you put the setter back, you will see test failures.
What kind of optimisations does this exclude? Importantly, can they be achieved if we just pass a
close
callback rather than the full iterator (this was the set-up when the PR was originally opened).
Whether we pass an iterator or a close
method tied to an iterator, it means that there is always the concept of an iterator at that stage. I.e., the map function then has knowledge about the execution. If we take a shortcut, where actually one iterator performs the work of 3 others in a single go, then things get confusing. So I don't think we should pass anything.
A close callback would also avoid the need for a custom
take
iterator.
Indeed, but at a cost. Perhaps it does make sense to have an UntilIterator
or so, which takes a function as an argument that, when true
, stops iteration. Then we have generalized stopping.
The main question is how we want to approach this PR now, I realize it's very inconvenient that, due to circumstances, there were 2 months between your and my initial work on this, and today. It's not easy to get the context back from then.
Do you still have the bandwidth to work on this? Just because you were volunteering back then, does not mean you still have the time today, and I want to be respectful of your contributions.
A way forward that I see at the moment is, given the current unit tests (which are a great harness to have), to re-envision the implementation a bit. I.e., if this were myself working on it, I would start with the existing code from the main
branch, and make it work with the unit tests from this branch by incrementally copying in the code from this branch. I'd get a stable version without the sequential map optimization, merge it, and then add the optimization as needed.
Let me know what you think and what's feasible for you. In any case, thanks a lot for working on this.
Do you still have the bandwidth to work on this?
Yep - I can have a go today branching off main
, use the current test harness, and then follow the incremental approach suggested. The responses above appear to provide some very useful explanations that I can use to help inform that design. If after that there are still some conceptual problems then it might be faster for you to take over rather than having to review several more iterations of my code - but will see how I go today before making that call.
thanks a lot for working on this.
No worries! I always enjoy working on OS code and I've learned a lot from the work and reviews. Thanks for all your work on the reviews!
The responses above appear to provide some very useful explanations that I can use to help inform that design. If after that there are still some conceptual problems then it might be faster for you to take over rather than having to review several more iterations of my code - but will see how I go today before making that call.
You're absolutely nailing it in #75, so let's finish things there 👍👍
An alternative to #58 which uses callbacks to close the iterator rather than creating a new class.