LinkedDataFragments / Client.js

[DEPRECATED] A JavaScript client for Triple Pattern Fragments interfaces.
http://linkeddatafragments.org/
Other
92 stars 35 forks source link

Chains of binary filters are not executed #27

Closed migalkin closed 7 years ago

migalkin commented 7 years ago

Hello, still trying out some queries against the data in the LDF server using the Client. I'm trying to evaluate the following query:

SELECT ?page ?president ?x WHERE { ?x <http://data.nytimes.com/elements/topicPage> ?page . 
?x <http://www.w3.org/2002/07/owl#sameAs> ?president . FILTER ((?president=
<http://dbpedia.org/resource/Alec_Brook-Krasny>) || (?president=
<http://dbpedia.org/resource/Barack_Obama>) || (?president=
<http://dbpedia.org/resource/Bill_Clements>) || (?president=
<http://dbpedia.org/resource/Davis_Filfred>) || (?president=
<http://dbpedia.org/resource/Dwight_D._Eisenhower>) || (?president=
<http://dbpedia.org/resource/James_Pinckney_Henderson>) || (?president=
<http://dbpedia.org/resource/Kenneth_Maryboy>) || (?president=
<http://dbpedia.org/resource/Peter_MacDonald_%28Navajo_leader%29>) || (?president=
<http://dbpedia.org/resource/Richard_Coke>) || (?president=
<http://dbpedia.org/resource/Ulysses_S._Grant_presidential_administration_scandals>))} LIMIT
 100000 OFFSET 0

The Client throws the error:

The query is not yet supported Invalid number of arguments for ||: 10 (expected: 2).

UnsupportedQueryError
    at new SparqlIterator (/ldf_rest/node_modules/ldf-client/lib/sparql/SparqlIterator.js:76:15)
    at /ldf_rest/routes/index.js:23:17
    at Layer.handle [as handle_request] (/ldf_rest/node_modules/express/lib/router/layer.js:95:5)
    at next (/ldf_rest/node_modules/express/lib/router/route.js:131:13)
    at Route.dispatch (/ldf_rest/node_modules/express/lib/router/route.js:112:3)
    at Layer.handle [as handle_request] (/ldf_rest/node_modules/express/lib/router/layer.js:95:5)
    at /ldf_rest/node_modules/express/lib/router/index.js:277:22
    at param (/ldf_rest/node_modules/express/lib/router/index.js:349:14)
    at param (/ldf_rest/node_modules/express/lib/router/index.js:365:14)
    at param (/ldf_rest/node_modules/express/lib/router/index.js:365:14)

So we can't submit more than 2 arguments in the FILTER clause?

RubenVerborgh commented 7 years ago

This seems to be a problem of the parser. The parser somehow sees this as an OR function with 10 parameters, whereas it actually are 9 OR functions with 2 parameters each.

RubenVerborgh commented 7 years ago

Fixed by https://github.com/RubenVerborgh/SPARQL.js/commit/2789d3f27eae364ce73ec03b7a7659dd30c2f126, which will be installed by 84027fcd21fd8ae90fb7c0aef9b21f8c21b0fd43.

migalkin commented 7 years ago

I'm using the npm version of the client, so how can I apply the /develop branch of the client in my app of it's not yet in npm?

RubenVerborgh commented 7 years ago

What you can do in package.json is:

"ldf-client": "LinkedDataFragments/Client.js#develop"

to use the develop branch of this repo.

migalkin commented 7 years ago

The problem occured again, on fedbench CD, LD and LS queries. We encounter two types of errors:

exception on query: SELECT ?film ?genre ?x WHERE {
        ?x <http://data.linkedmdb.org/resource/movie/genre> ?genre . 
        ?x <http://www.w3.org/2002/07/owl#sameAs> ?film . FILTER ((?film=<http://dbpedia.org/resource/Remember_Me%2C_My_Love>) || (?film=<http://dbpedia.org/resource/But_Forever_in_My_Mind>) || (?film=<http://dbpedia.org/resource/Ecco_fatto>) || (?film=<http://dbpedia.org/resource/L%27ultimo_bacio>) || (?film=<http://dbpedia.org/resource/Seven_Pounds>) || (?film=<http://dbpedia.org/resource/The_Pursuit_of_Happyness>))

} LIMIT 10000 OFFSET 0 
Code: ''

The second includes : in URIs

exception on query: SELECT ?mass ?cas ?keggDrug WHERE {
?keggDrug <http://bio2rdf.org/ns/bio2rdf#xRef> ?cas . 
?keggDrug <http://bio2rdf.org/ns/bio2rdf#mass> ?mass
FILTER ((?mass > '5')) . FILTER ((?cas=<http://bio2rdf.org/cas:198153-51-4>) || (?cas=<http://bio2rdf.org/cas:105857-23-6>) || (?cas=<http://bio2rdf.org/cas:74899-72-2>) || (?cas=<http://bio2rdf.org/cas:99210-65-8>) || (?cas=<http://bio2rdf.org/cas:77907-69-8>) || (?cas=<http://bio2rdf.org/cas:145155-23-3>) || (?cas=<http://bio2rdf.org/cas:140608-64-6>) || (?cas=<http://bio2rdf.org/cas:59865-13-3>) || (?cas=<http://bio2rdf.org/cas:214745-43-4>) || (?cas=<http://bio2rdf.org/cas:83150-76-9>))
} LIMIT 10000 OFFSET 0 
Code: ''

In both cases the error code is empty, so we can't recognize the exact place of the error. The only suggestion about the 2nd error type is about colons in URIs, but serdi did not consider it as an error during HDT parsing.

RubenVerborgh commented 7 years ago

Does the underlying Client.js implementation give anything? Because exception on query and Code: are not strings I recognize from the Client.js software, so they must be from your software. Is there any extra output we can rely on (there should be).

migalkin commented 7 years ago

The only error is produced by the KEGG endpoint:

[Sat Dec 10 2016 18:26:20 GMT+0000 (UTC)] DEBUG TriplePatternIterator 159657 59569 {"?keggDrug":"http://bio2rdf.org/dr:D03842","?id":"http://bio2rdf.org/pubchem:17397928"} dr title ?title. 1
[Sat Dec 10 2016 18:26:20 GMT+0000 (UTC)] DEBUG TriplePatternIterator 159658 59570 {"?keggDrug":"http://bio2rdf.org/dr:D03843"} dr xRef ?id. 5
[Sat Dec 10 2016 18:26:20 GMT+0000 (UTC)] WARNING TriplePatternIterator Unexpected "<http://bio2rdf.org/cas:" on line 38.
events.js:160
      throw er; // Unhandled 'error' event
      ^

Error: Unexpected "<http://bio2rdf.org/cas:" on line 38.
    at N3Lexer._syntaxError (/ldf_rest/node_modules/n3/lib/N3Lexer.js:360:12)
    at reportSyntaxError (/ldf_rest/node_modules/n3/lib/N3Lexer.js:327:54)
    at N3Lexer._tokenizeToEnd (/ldf_rest/node_modules/n3/lib/N3Lexer.js:313:18)
    at TrigFragmentIterator._parseData (/ldf_rest/node_modules/n3/lib/N3Lexer.js:395:16)
    at TrigFragmentIterator.TurtleFragmentIterator._transform (/ldf_rest/node_modules/ldf-client/lib/triple-pattern-fragments/TurtleFragmentIterator.js:47:8)
    at readAndTransform (/ldf_rest/node_modules/asynciterator/asynciterator.js:959:12)
    at TrigFragmentIterator.TransformIterator._read (/ldf_rest/node_modules/asynciterator/asynciterator.js:945:3)
    at TrigFragmentIterator.BufferedIterator._fillBuffer (/ldf_rest/node_modules/asynciterator/asynciterator.js:768:10)
    at Immediate.fillBufferAsyncCallback (/ldf_rest/node_modules/asynciterator/asynciterator.js:800:8)
    at runCallback (timers.js:639:20)
RubenVerborgh commented 7 years ago

It seems that the parser considers http://bio2rdf.org/cas:… an invalid URL. Let me investigate.

RubenVerborgh commented 7 years ago

Hmmm, I cannot reproduce this in the parser. Would you be able to send me the contents of the fragment page that produces this error? (Or at least the full contents of line 38?)

RubenVerborgh commented 7 years ago

The error message seems to indicate that a spacing character appears after http://bio2rdf.org/cas:, which would make the URI invalid.

migalkin commented 7 years ago

What do you mean under the fragment page? The client logs tons of subqueries like

[Sat Dec 10 2016 18:26:20 GMT+0000 (UTC)] INFO HttpClient Requesting http://kegg-ldh:3000/kegg?subject=http%3A%2F%2Fbio2rdf.org%2Fcpd%3AC00852&predicate=http%3A%2F%2Fbio2rdf.org%2Fns%2Fbio2rdf%23xRef
[Sat Dec 10 2016 18:26:20 GMT+0000 (UTC)] DEBUG TriplePatternIterator 159654 58803 {"?keggDrug":"http://bio2rdf.org/cpd:C00850","?mass":"\"173.9987\""} ?keggDrug xRef ?cas. 1
[Sat Dec 10 2016 18:26:20 GMT+0000 (UTC)] INFO HttpClient Requesting http://kegg-ldh:3000/kegg?subject=http%3A%2F%2Fbio2rdf.org%2Fdr%3AD03845&predicate=http%3A%2F%2Fbio2rdf.org%2Fns%2Fbio2rdf%23xRef
[Sat Dec 10 2016 18:26:20 GMT+0000 (UTC)] INFO HttpClient Requesting http://kegg-ldh:3000/kegg?subject=http%3A%2F%2Fbio2rdf.org%2Fcpd%3AC00853&predicate=http%3A%2F%2Fbio2rdf.org%2Fns%2Fbio2rdf%23xRef
[Sat Dec 10 2016 18:26:20 GMT+0000 (UTC)] DEBUG TriplePatternIterator 159655 59569 {"?keggDrug":"http://bio2rdf.org/dr:D03842","?id":"http://bio2rdf.org/cas:143224-34-4"} dr title ?title. 1
[Sat Dec 10 2016 18:26:20 GMT+0000 (UTC)] DEBUG TriplePatternIterator 159656 59569 {"?keggDrug":"http://bio2rdf.org/dr:D03842","?id":"http://bio2rdf.org/ligandbox:D03842"} dr title ?title. 1
[Sat Dec 10 2016 18:26:20 GMT+0000 (UTC)] DEBUG TriplePatternIterator 159657 59569 {"?keggDrug":"http://bio2rdf.org/dr:D03842","?id":"http://bio2rdf.org/pubchem:17397928"} dr title ?title. 1

We spot the problem with kegg at the fedbench LSD7 query:

SELECT ?drug ?transform ?mass WHERE {
  ?drug drugbank:affectedOrganism  'Humans and other mammals'.
  ?drug drugbank:casRegistryNumber ?cas .
  ?keggDrug bio2rdf:xRef ?cas .
  ?keggDrug bio2rdf:mass ?mass
     FILTER ( ?mass > '5' )
     OPTIONAL { ?drug drugbank:biotransformation ?transform . } }

The endpoint can't answer the following decomposed and all the subsequent queries:

SELECT ?mass ?cas ?keggDrug WHERE {
        ?keggDrug <http://bio2rdf.org/ns/bio2rdf#xRef> ?cas . 
        ?keggDrug <http://bio2rdf.org/ns/bio2rdf#mass> ?mass
FILTER ((?mass > '5')) . FILTER ((?cas=<http://bio2rdf.org/cas:50-56-6>) || (?cas=<http://bio2rdf.org/cas:50-81-7>) || (?cas=<http://bio2rdf.org/cas:73-22-3>) || (?cas=<http://bio2rdf.org/cas:59-30-3>) || (?cas=<http://bio2rdf.org/cas:59-02-9>) || (?cas=<http://bio2rdf.org/cas:81093-37-0>) || (?cas=<http://bio2rdf.org/cas:54739-18-3>) || (?cas=<http://bio2rdf.org/cas:137862-53-4>) || (?cas=<http://bio2rdf.org/cas:87333-19-5>) || (?cas=<http://bio2rdf.org/cas:300-62-9>))

There are no URIs with a space to my knowledge. Also, no prefixes are used in this case.

RubenVerborgh commented 7 years ago

What do you mean under the fragment page?

The error you get (Unexpected "<http://bio2rdf.org/cas:" on line 38.) is because, at some point, the client receives an HTTP response from the server that has an invalid URI on line 38. So one of the resources such as http://kegg-ldh:3000/kegg?subject=http%3A%2F%2Fbio2rdf.org%2Fcpd%3AC00853&predicate=http%3A%2F%2Fbio2rdf.org%2Fns%2Fbio2rdf%23xRef has an error on line 38.

But the decomposed query you gave and the kegg.hdt file should trigger the same error on my side; I will check that ASAP and get back to you.

RubenVerborgh commented 7 years ago

I did the following:

I did not run into the error.

I wonder if you are perhaps a) using a different HDT file b) using a different parser version. To check the latter, you can do npm ls n3, which shows version 0.8.3 on my machine.

RubenVerborgh commented 7 years ago

BTW Also a note regarding your experiment: the LDF client does currently not optimize FILTER, so the filter will be applied after the whole BGP has been evaluated, which might not be the best choice. If you would like us to implement such an optimization, please get in touch.

migalkin commented 7 years ago

Used the HDT file by the link, the parser version is

npm info it worked if it ends with ok
npm info using npm@3.10.9
npm info using node@v7.1.0
| +-- n3@0.8.3
npm info ok 

The heap memory issue when evaluating FILTERs against large sets of triples was apparently solved by increasing max_old_space parameter.

RubenVerborgh commented 7 years ago

I cannot reproduce the parsing issue then, I'm afraid. If you could somehow show me that line 38, that would help.