comunica / comunica-feature-link-traversal

📬 Comunica packages for link traversal-based query execution
Other
8 stars 11 forks source link

Process ends unexpectedly on certain path queries #95

Closed jeswr closed 1 year ago

jeswr commented 1 year ago

Running the following on the latest version of the link-traversal package causes the process to end without logging anything. Contrastingly the empty array is produced if using the latest version of @comunica/query-sparql

import { QueryEngine } from '@comunica/query-sparql-link-traversal';

const query = `
PREFIX : <http://example.org/>

CONSTRUCT {
  :s :p ?rules .
} WHERE {
  <http://example.org/> (:p1|^:p2)/:p3 ?rules .
}
`

async function main() {
  const engine = new QueryEngine();
  const quads = await engine.queryQuads(query, { lenient: true, sources: [ 'http://example.org/' ] });
  console.log(await quads.toArray())
}

main();
rubensworks commented 1 year ago

Could you follow the issue templates for future issues? Otherwise our auto-labeling and project pipeline doesn't catch it properly.

rubensworks commented 1 year ago

Is there a public source over which this could be reproduced?

jeswr commented 1 year ago

Invalid/empty resources such as http://example.org/ actually reproduce the results - and will terminate you simplify the path (for instance by removing the last part of the sequence).

I'll look into also putting up a minimal document to reproduce where there is at least one result.

jeswr commented 1 year ago

Here is an example against your WebID

import { QueryEngine } from '@comunica/query-sparql-link-traversal';

const query = `
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX : <http://example.org/>

CONSTRUCT {
  <https://www.rubensworks.net/#me> :knowsSomeoneWithName ?name .
} WHERE {
  <https://www.rubensworks.net/#me> (foaf:knows|^foaf:knows)/foaf:name ?name .
}
`

async function main() {
  const engine = new QueryEngine();
  const quads = await engine.queryQuads(query, { lenient: true });
  console.log(await quads.toArray())
}

main();

It will go about a minute (presumably traversing) before eventually stopping withoutputting the quads array. If you listen to the data and end events you will see that a bunch of results are produced; it is just the end event that is not produced.

rubensworks commented 1 year ago

Thanks!

rubensworks commented 1 year ago

Can't reproduce the problem on my end it seems. It's always going out of memory, even after increasing max memory. So it looks like it just keeps on running on my end.

rubensworks commented 1 year ago

Ok, can reproduce the problem with just http://example.org/ as source.

rubensworks commented 1 year ago

It's definitely related to the streaming source, as setting aggregateStore to false fixes the problem.

rubensworks commented 1 year ago

Just looked into this (a bit), and it looks like the problem is that the 'end' event is not emitted upwards to the top-level bindingsStream when lower-level streams are destroyed. Modifying this line https://github.com/RubenVerborgh/AsyncIterator/blob/main/asynciterator.ts#L112 to newState >= exports.ENDED fixes the problem. But this may be not the solution we want.

jeswr commented 1 year ago

The lack of an end event seems to be expected behavior (see https://github.com/RubenVerborgh/AsyncIterator/blob/3171521759194b3ce463f989d8253e35978ef690/asynciterator.ts#L74).

I also ran a test against node streams and they also do not emit an end event (though they do emit a close event).

import { Readable } from 'stream';
const iterator = new Readable();

iterator.on('data', () => { console.log('data') });
iterator.on('end', () => { console.log('end') });
iterator.on('error', () => { console.log('error') });
iterator.on('close', () => { console.log('close') });

iterator.push('a');
iterator.push(null);

gives

data
end
close

wheras

import { Readable } from 'stream';
const iterator = new Readable();

iterator.on('data', () => { console.log('data') });
iterator.on('end', () => { console.log('end') });
iterator.on('error', () => { console.log('error') });
iterator.on('close', () => { console.log('close') });

iterator.push('a');
iterator.destroy();

gives

data
close
jeswr commented 1 year ago

I think the greater issue here is that there is an error event that does not get forwarded somewhere (probably one created by a destruction in https://github.com/comunica/comunica/issues/1164); hence why I suggested here that an error be thrown in the next mver of asynciterator if an error event is emitted without any error listeners

jeswr commented 1 year ago

For this particular bug - I've narrowed the problem down to the fact that the stream returned by StreamingStore#match never emits an end or error event.

If you add the followng to the end of the match method in the StreamingStore then the results will end, and you will observe that no data end or error events are called in the http://example.org/ case

stream.on('data', () => { console.log('data') })
        stream.on('end', () => { console.log('end called in streaming store') })
        stream.on('error', () => { console.log('error called in streaming store') });

        const stream2 = new readable_stream_1.Readable({ objectMode: true });
        stream2.push(null)
        return stream2;

In turn this seems to be because #end is never called on the streaming store.

rubensworks commented 1 year ago

I've narrowed the problem down to the fact that the stream returned by StreamingStore#match never emits an end or error event.

I don't think this is the cause of the bug. It's simply due to the circular dependency with the MediatedLinked...Iterator, which will be closed together with the streams created by StreamingStore#match.

rubensworks commented 1 year ago

One possible solution would be to make AsyncIterator also emit close event, like in Node streams.

rubensworks commented 1 year ago

Problem is fixed (not released yet). A close event was not needed after all. The semantics of destroy are compatible with what we need in Comunica.