Closed RubenVerborgh closed 4 years ago
This appears to be a very painfull one to debug...
Some findings in the meantime:
autoStart: false
, but this may potentially not work well together with attaching/detaching data listeners).Is there an alternative streamToAsyncIterable
implementation available?
Is there an alternative streamToAsyncIterable implementation available?
Buffering would be one obvious alternative (which will fix the problem). But I don't think we want that, as we would lose lazy streaming.
To clarify, I don't think streamToAsyncIterable
is incorrect, but it only causes the problem.
This is confirmed by checking how many times the yieldValue
callback is invoked for the "friend names" test. Sometimes it's 296 (all friends), but most of the time it's a random value between 0-296. This seems to indicate that the data event is being emitted without a listener being attached.
Does Comunica listen for listener unsubscribes and react immediately?
Does Comunica listen for listener unsubscribes and react immediately?
No, it does not. But AFAICS comunica should not listen to unsubscribes, as asynciterator should handle those, right?
asynciterator should handle those
And it seems to be covered indeed: https://github.com/RubenVerborgh/AsyncIterator/blob/v3.0.1/asynciterator.ts#L517-L537
There's actually a possible race condition:
while (this.listenerCount('data') !== 0 && (item = this.read()) !== null)
this.emit('data', item);
What is the listener is removed when read
is called?
Not sure what to do with the read item then though…
FYI just tried changing it into
// Reads the next item
function readNext(resolve, reject) {
if (pendingError)
return reject(pendingError);
if (readable.done)
return resolve({ value: undefined, done: true });
// Try to see if an item is already available
const item = readable.read();
if (item !== null)
return resolve({ value: item, done: false });
// Attach stream listeners
readable.once('data', yieldValue);
readable.once('end', finish);
readable.once('error', finish);
but does not help.
…and a Heisenbug, too. Inserting logging makes it more likely for the error not to occur.
I'll try to see if I can reproduce it without Comunica and LDflex.
…and a Heisenbug, too. Inserting logging makes it more likely for the error not to occur.
And even worse, when trying the scenario outside of Jest (to get rid of the whole ceremony Jest inserts around console.log
), it consistently succeeds.
And even worse, when trying the scenario outside of Jest (to get rid of the whole ceremony Jest inserts around console.log), it consistently succeeds.
That's not good :-(
At least we know how to reproduce it in the browser, so that's something: https://github.com/solid/query-ldflex/issues/45#issuecomment-690124637
In any case, the whole listener reattachment code has too many moving pieces; we’re literally switching from flow to non-flow mode hundreds of times per second. That’s bound to fail. So I will change that regardless, and maybe that fixes it.
Sounds good.
One solution might be to create an async variant on .read()
, but will always return the next data element (unless end is reached), as flow-mode isn't really needed in this case.
Implemented streamToAsyncIterable
without data
listeners (https://github.com/LDflex/LDflex-Comunica/commit/cb0e118e9e4c4908fcf796212e33ca1bbbc8e463), and the problem persists. This might be helpful for debugging.
So you can npm link @ldflex/comunica
to the latest master if you want.
Interesting finding: with this change, the test "main profile fields" also starts failing sometimes on my end. This seems to indicate that the problem also occurs for queries with just a single triple pattern, while I used to think it occured only on queries with 2+ triple patterns.
Another finding: Contrary to what I assumed before, no intermediary results are lost. Instead, at some point in the stream data and end events seem to halt.
For example, if we have a stream that should go from 1->10, we could have something like this: 1,2,3,4,5,6,7, FREEZE
Yeah, so it's definitely a race condition somewhere. Hence, we should also be careful with some of the assumptions above, in particular:
Can not reproduce when calling Comunica directly (with a regular on-data listener).
Is there any possibility of doing a git bisect?
Is there any possibility of doing a git bisect?
Should be possible, but unlikely to be useful I think. I'm quite sure the problem is caused by the asynciterator update, which is part of a large refactoring commit. Currently doing some fine-grained debugging to find out the exact location in the asynciterators where it fails.
Ok, good news and bad news.
Good news: I've identified the problem: https://github.com/comunica/comunica/blob/master/packages/actor-query-operation-bgp-single/lib/ActorQueryOperationBgpSingle.ts#L33-L42 Removing that block makes the LDflex tests always pass again.
Bad news: Removing this block is (currently) not an option, since it breaks other tests.
The way forward: This problem is related to these issues:
Comunica's current way of handling metadata internally causes these problems, and needs to be refactored. Unfortunately this will require quite a bit of time, as I don't see a possible quick fix here.
Excellent job on finding the problem. That must've been hard!
In addition to identifying it, did we also learn something about how to spot such problems more easily in the future?
Excellent job on finding the problem. That must've been hard!
Yes, this definitely goes in my top 10 of most frustrating bugs.
In addition to identifying it, did we also learn something about how to spot such problems more easily in the future?
Things would have been simplified a lot if we'd have the ability to print the state of our asynciterators. This would allow us to identify where in the iterator pipeline things are blocked.
For example:
const it = new AsyncIterator(...);
it.debug = 'My-Iterator-1';
setTimeout(() => it.dumpState(), 1000);
Output:
My-Iterator-1
STATE: READABLE
BUFFER: [1, 2, 3]
DESTINATION: ...
Maybe some kind of global flag to enable this on all iterators would also be useful.
Yes, I do something similar when debugging them myself. I think it would be good to ship a debug build of asynciterator with such logging functionality built in.
All of the tests seem to pass on my end in the newly released Comunica version (1.17.0).
@rubensworks The upgrade of our Comunica dependencies introduces unpredictable timeouts in
test/system/profile-test.js
. Could you have a look?