Move away from SPARQL.js (and potentially N3)

dwhitney commented 5 years ago

Is your feature request related to a problem? Please describe.

SPARQL.js is a major performance bottleneck! I've found that in the majority of my queries, parsing the query takes the most time. For example this query takes just over 100ms to parse, and about 6ms to process the data. I realize it's not a trivial query, but 100ms!

What I like about this library is that I can cache small data sets locally in the browser and query large data sets remotely via SERVICE clauses (very cool!), but if it takes 100ms to parse the query, I may as well just do everything remotely, because caching isn't saving me any time.

Describe the solution you'd like

I did some research on JS parsers and found a library under pretty active development called Chevrotain that pretty much blows jison out of the water from a performance perspective. (jison is what SPARQL.js uses). Then I found another library that already has a SPARQL and Turtle parser called millan. Just some quick bench marking on the query I linked to above showed a 100% performance improvement on the first parse and then 500% performance improvement upon subsequent parses.

Milan has a bunch of "junk" in their library that I don't think is necessary for sparql-engine, so I think I'd maybe take that library as inspiration rather than simply using it directly. Also Milan is running an older version of Chevrotain, and I can see a bunch of places that performance improvements could be made to their current parser.

What do you think about this idea:

abstract out the parser and make the PlanBuilder take a parser param of type string => Algebra.RootNode.
create another project called sparql-engine-sparqljs-parser that provides an implementation with the current functionality
create another project called sparql-engine-chevrotain-parser that provides an implementation with a Chevrotain parser.

This would let people choose which they liked better -- maybe they are already using one or the other.

Also down the road N3 could probably be replaced with a Chevrotain parser as well. I have had issues with N3 in the past, and I find the source code very difficult to read, but it's fine for now.

Let me know your thoughts and I am happy to do this when I get to it.

mielvds commented 5 years ago

If jison is the problem, why don't you suggest SPARQL.js to replace it by Chevrotain?

dwhitney commented 5 years ago

I'd prefer an abstraction that allows a user to choose rather than prescribing to users how it ought to be done. Maybe a user already has a parser in their project that they can reuse rather than add a new dependency with equivalent functionality.

On Tue, May 21, 2019, 3:32 AM mielvds notifications@github.com wrote:

If jison is the problem, why don't you suggest SPARQL.js to replace it by Chevrotain?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Callidon/sparql-engine/issues/22?email_source=notifications&email_token=AAAIFINXCJ2RNISOH4L3K6LPWOQR5A5CNFSM4HJTTVI2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODV3AQRQ#issuecomment-494274630, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAIFIO4GZCDXJHYCC4KBSTPWOQR5ANCNFSM4HJTTVIQ .

RubenVerborgh commented 5 years ago

May I suggest to also report issues in the original repositories? This can help us fix the problem for more people, and I also can give you some background on the decisions made back then and what could help to improve things in other libraries.

SPARQL.js is a major performance bottleneck! I've found that in the majority of my queries, parsing the query takes the most time. For example this query takes just over 100ms to parse, and about 6ms to process the data. I realize it's not a trivial query, but 100ms!

Yes, the underlying jison library was the best at the time, but it was never really good. An even worse case is with syntactically incorrect queries, which can get you very extreme times.

Back when I developed SPARQL.js, there was no single fully compliant SPARQL library for JavaScript. People were using the rdfstore.js parser, which was incomplete and buggy.

So my main goals with SPARQL.js were to: 1) set up a testing framework to assess compliance with the SPARQL 1.1 spec 2) have a fast path to a compatible parser

Because of 2), I resorted to a parser generator framework (jison) rather than hand-rolling my own as I did with N3. It was the workflow that could deliver the parser the fastest way, not necessarily the most performant parser.

Now jison had many issues, some of which I fixed upstream, but the author wasn't very responsive with accepting pull requests, so I new that this was not a sustainable route. But at least we had a spec-compatible parser now, with acceptable performance for the kinds of queries we needed.

I haven't checked performance of your query; maybe it is one specific fixable thing causing trouble, maybe it's more.

I did some research on JS parsers and found a library under pretty active development called Chevrotain that pretty much blows jison out of the water from a performance perspective.

One would hope indeed!

From my perspective, I would invite you to reuse the testing framework of SPARQL.js. That was arguably what took most time, and then you'll immediately know how compatible your parser is. If you want, I'm also happy to have you as a contributor on SPARQL.js itself; there's no need for keeping an old library around if we have an equally compatible but faster one. Please let me know.

Milan has a bunch of "junk" in their library that I don't think is necessary for sparql-engine,

Proper imports and a sufficiently smart packer should get you rid of that.

Also down the road N3 could probably be replaced with a Chevrotain parser as well.

Will it be as fast, and will it be streaming? N3 is an extremely optimized hand-rolled parser, with a performance that will be hard to beat by parser generators.

I have had issues with N3 in the past

Then please report them 🙂 I'm known for my fast response times.

All of the above aside, it's great that there are different options for dealing with RDF and SPARQL in JavaScript. However, we might want to recall that we're a very small community. Shall we try to work together whenever possible and meaningful? I don't think we need many different SPARQL parsers, that will only break up the community and lower general adoption. Can we think collaboration first, and if that does not work out, working in parallel to a common interoperable spec and format (RDF/JS and rdf-ext)? This will help us all.

Callidon / sparql-engine

Move away from SPARQL.js (and potentially N3) #22