LinkedDataFragments / Client.Java

A Triple Pattern Fragments client for Java (Jena)
MIT License
8 stars 10 forks source link

How fast is querying ? #4

Closed rom1504 closed 9 years ago

rom1504 commented 9 years ago

I did some test in order to know how fast querying using this lib is :

I tried to query 10 time with query ask { <http://dbpedia.org/resource/Bill_Clinton> <http://dbpedia.org/ontology/child> ?o } and it needs 3513ms here. That means 350ms by query.

Using a regular SPARQL endpoint (http://dbpedia.org/sparql) I can query this 100 times in 3504ms which would mean 35ms by query.

That's 10 times faster for a regular endpoint.

I know there are some smart caching mechanisms in virtuoso so maybe trying the same query again and again is not perfect for a benchmark, but still LinkedDataFragments doesn't seem very fast.

Is this expected ? Is this coming from that library or is it due to the Linked Data Fragments server ?

I was hoping LinkedDataFragments might have been faster than a regular sparql endpoint.

rom1504 commented 9 years ago

http://linkeddatafragments.org/publications/ldow2014.pdf says it should take some ms by query. I guess that's in ideal conditions, but still 350ms seems too much.

Could it come from this library ? (maybe jena ?)

Edit: Oh I misread the numbers, in the article the query are answered in seconds. I guess that's true for complicated queries.

RubenVerborgh commented 9 years ago

Hi @rom1504,

Querying with Triple Pattern Fragments is definitely not faster than querying a SPARQL endpoint. This is logical: the interface is much simpler, so in order to perform the same task (i.e., execute the same query), a client has to perform much more work. With SPARQL endpoints, the server performs all the work.

And this is precisely the problem of SPARQL endpoints: the interface is so powerful, that it goes down often. The majority of SPARQL endpoints is down for more than 1.5 days per month.

The goal of Triple Pattern Fragments is to minimize server cost and thus maximize server availability, at the cost of slower queries and increased bandwidth, as is analyzed in detail in our ISWC2014 paper, the demo of which won the ISWC2014 Best Demo Award for running DBpedia on a Raspberry Pi. The keywords are “low cost” and “high availability”.

The Jena implementation reuses the Jena query infrastructure, which was not designed for parallelized requests. So while a lot of SPARQL features are supported, it is not the optimal approach. A better algorithm is implemented from scratch in our JavaScript client.

So yes, if you're lucky, you can solve a SPARQL query in 300ms on a SPARQL endpoint—roughly 95% of time. With Triple Pattern Fragments, the same query might take you 3 seconds (for all results to arrive; the first results will be faster)—but that's 3 seconds now, tonight, tomorrow, or 99.999% of time. One solution allows you to build fast applications; the other gives you reliable applications. I prefer reliability—that's why I designed Triple Pattern Fragments.

Best,

Ruben

rom1504 commented 9 years ago

Hi, Thanks for the explanation, I'll read your paper to understand all this better.

I'm building a Question Answer engine on DBpedia which does a lot of simple (one triple) ASK query at some point. So I thought Triple Pattern Fragments might be especially adapted for this. I currently have my own virtuoso instance loaded with DBpedia and that works but that means using a lot of memory on a server and using Triple Pattern Fragments might help to decrease the memory usage.

Best, Romain

akuckartz commented 9 years ago

My 10 cent: Another type of LDF fragment supporting "typical" queries which reduces the time would make a lot of sense. Defining "typical" is not trivial but I suppose that a rather small number of SPARQL query examples would be enough to create a set of query patterns used mostly.

RubenVerborgh commented 9 years ago

@akuckartz True—and Triple Pattern Fragments were indeed chosen to match that “typical” definition. From that point onwards, it gets more complicated though. What is the next reasonable step? We have some ideas in mind, and are experimenting with some things, but it is definitely not trivial.

akuckartz commented 9 years ago

@RubenVerborgh Do you know if there are any research results available regarding "typical" SPARQL-queries?

Another approach might be to analyse "typical" web APIs. Most of them support a set of hardwired queries. These likely are the most interesting ones. Due to popularity the interface of elasticsearch might be particularly relevant: See http://www.elastic.co/guide/en/elasticsearch/reference/current/search.html and http://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl.html

RubenVerborgh commented 9 years ago

That research would be the domain of the USEWOD workshop series, with some potentially interesting papers for instance in 2014.

Regarding elasticsearch, an extension for full-text search is on my list. This would allow to solve some interesting FILTERs, which now require downloading everything.

akuckartz commented 9 years ago

@RubenVerborgh :+1:

mielvds commented 9 years ago

In this respect, an interesting thought is how much the SPARQL restriction shapes queries? In what way do performance, limits, availability of SPARQL endpoints determine the queries asked by applications today?