LDflex / Query-Solid

Simple access to data in Solid pods through LDflex expressions
https://solid.github.io/query-ldflex/
MIT License
66 stars 15 forks source link

Upgrade asynciterator to v3.x #71

Closed RubenVerborgh closed 4 years ago

RubenVerborgh commented 4 years ago

@rubensworks The upgrade of our Comunica dependencies introduces unpredictable timeouts in test/system/profile-test.js. Could you have a look?

rubensworks commented 4 years ago

This appears to be a very painfull one to debug...

Some findings in the meantime:

RubenVerborgh commented 4 years ago

Is there an alternative streamToAsyncIterable implementation available?

rubensworks commented 4 years ago

Is there an alternative streamToAsyncIterable implementation available?

Buffering would be one obvious alternative (which will fix the problem). But I don't think we want that, as we would lose lazy streaming.

To clarify, I don't think streamToAsyncIterable is incorrect, but it only causes the problem.

This is confirmed by checking how many times the yieldValue callback is invoked for the "friend names" test. Sometimes it's 296 (all friends), but most of the time it's a random value between 0-296. This seems to indicate that the data event is being emitted without a listener being attached.

RubenVerborgh commented 4 years ago

Does Comunica listen for listener unsubscribes and react immediately?

rubensworks commented 4 years ago

Does Comunica listen for listener unsubscribes and react immediately?

No, it does not. But AFAICS comunica should not listen to unsubscribes, as asynciterator should handle those, right?

RubenVerborgh commented 4 years ago

asynciterator should handle those

And it seems to be covered indeed: https://github.com/RubenVerborgh/AsyncIterator/blob/v3.0.1/asynciterator.ts#L517-L537

RubenVerborgh commented 4 years ago

There's actually a possible race condition:

  while (this.listenerCount('data') !== 0 && (item = this.read()) !== null)
    this.emit('data', item);

What is the listener is removed when read is called? Not sure what to do with the read item then though…

RubenVerborgh commented 4 years ago

FYI just tried changing it into

    // Reads the next item
    function readNext(resolve, reject) {
      if (pendingError)
        return reject(pendingError);
      if (readable.done)
        return resolve({ value: undefined, done: true });

      // Try to see if an item is already available
      const item = readable.read();
      if (item !== null)
        return resolve({ value: item, done: false });

      // Attach stream listeners
      readable.once('data', yieldValue);
      readable.once('end', finish);
      readable.once('error', finish);

but does not help.

RubenVerborgh commented 4 years ago

…and a Heisenbug, too. Inserting logging makes it more likely for the error not to occur.

rubensworks commented 4 years ago

I'll try to see if I can reproduce it without Comunica and LDflex.

RubenVerborgh commented 4 years ago

…and a Heisenbug, too. Inserting logging makes it more likely for the error not to occur.

And even worse, when trying the scenario outside of Jest (to get rid of the whole ceremony Jest inserts around console.log), it consistently succeeds.

rubensworks commented 4 years ago

And even worse, when trying the scenario outside of Jest (to get rid of the whole ceremony Jest inserts around console.log), it consistently succeeds.

That's not good :-(

At least we know how to reproduce it in the browser, so that's something: https://github.com/solid/query-ldflex/issues/45#issuecomment-690124637

RubenVerborgh commented 4 years ago

In any case, the whole listener reattachment code has too many moving pieces; we’re literally switching from flow to non-flow mode hundreds of times per second. That’s bound to fail. So I will change that regardless, and maybe that fixes it.

rubensworks commented 4 years ago

Sounds good.

One solution might be to create an async variant on .read(), but will always return the next data element (unless end is reached), as flow-mode isn't really needed in this case.

RubenVerborgh commented 4 years ago

Implemented streamToAsyncIterable without data listeners (https://github.com/LDflex/LDflex-Comunica/commit/cb0e118e9e4c4908fcf796212e33ca1bbbc8e463), and the problem persists. This might be helpful for debugging.

So you can npm link @ldflex/comunica to the latest master if you want.

rubensworks commented 4 years ago

Interesting finding: with this change, the test "main profile fields" also starts failing sometimes on my end. This seems to indicate that the problem also occurs for queries with just a single triple pattern, while I used to think it occured only on queries with 2+ triple patterns.

rubensworks commented 4 years ago

Another finding: Contrary to what I assumed before, no intermediary results are lost. Instead, at some point in the stream data and end events seem to halt.

For example, if we have a stream that should go from 1->10, we could have something like this: 1,2,3,4,5,6,7, FREEZE

RubenVerborgh commented 4 years ago

Yeah, so it's definitely a race condition somewhere. Hence, we should also be careful with some of the assumptions above, in particular:

Can not reproduce when calling Comunica directly (with a regular on-data listener).

Is there any possibility of doing a git bisect?

rubensworks commented 4 years ago

Is there any possibility of doing a git bisect?

Should be possible, but unlikely to be useful I think. I'm quite sure the problem is caused by the asynciterator update, which is part of a large refactoring commit. Currently doing some fine-grained debugging to find out the exact location in the asynciterators where it fails.

rubensworks commented 4 years ago

Ok, good news and bad news.

Good news: I've identified the problem: https://github.com/comunica/comunica/blob/master/packages/actor-query-operation-bgp-single/lib/ActorQueryOperationBgpSingle.ts#L33-L42 Removing that block makes the LDflex tests always pass again.

Bad news: Removing this block is (currently) not an option, since it breaks other tests.

The way forward: This problem is related to these issues:

Comunica's current way of handling metadata internally causes these problems, and needs to be refactored. Unfortunately this will require quite a bit of time, as I don't see a possible quick fix here.

RubenVerborgh commented 4 years ago

Excellent job on finding the problem. That must've been hard!

In addition to identifying it, did we also learn something about how to spot such problems more easily in the future?

rubensworks commented 4 years ago

Excellent job on finding the problem. That must've been hard!

Yes, this definitely goes in my top 10 of most frustrating bugs.

In addition to identifying it, did we also learn something about how to spot such problems more easily in the future?

Things would have been simplified a lot if we'd have the ability to print the state of our asynciterators. This would allow us to identify where in the iterator pipeline things are blocked.

For example:

const it = new AsyncIterator(...);
it.debug = 'My-Iterator-1';
setTimeout(() => it.dumpState(), 1000);

Output:

My-Iterator-1
  STATE: READABLE
  BUFFER: [1, 2, 3]
  DESTINATION: ...

Maybe some kind of global flag to enable this on all iterators would also be useful.

RubenVerborgh commented 4 years ago

Yes, I do something similar when debugging them myself. I think it would be good to ship a debug build of asynciterator with such logging functionality built in.

rubensworks commented 4 years ago

All of the tests seem to pass on my end in the newly released Comunica version (1.17.0).