Callidon / sparql-engine

🚂 A framework for building SPARQL query engines in Javascript/Typescript
https://callidon.github.io/sparql-engine
MIT License
99 stars 14 forks source link

Q: Setting up federated queries to existing SPARQL endpoints #59

Closed larsgw closed 3 years ago

larsgw commented 3 years ago

Is there an easy way to set up federated queries to existing SPARQL endpoints? From what I have seen, I cannot simply implement find() based on SPARQL endpoints, so I came up with the following (which has a few problems):

const { Graph, Pipeline, BindingBase } = require('sparql-engine')
const fetch = require('node-fetch')

function formatTriple (triple) {
  return [triple.subject, triple.predicate, triple.object].join(' ')
}

function formatQuery (triples) {
  return `SELECT * WHERE {
  ${triples.map(formatTriple).join(` .
  `)}  
}`
}

class EndpointGraph extends Graph {
  constructor (iri) {
    super()
    this.iri = iri
  }

  evalBGP (triples) {
    return Pipeline.getInstance().fromAsync(input => {

      fetch(this.iri, {
        method: 'POST',
        body: formatQuery(triples),
        headers: {
          'Content-Type': 'application/sparql-query',
          Accept: 'application/json'
        }
      })
        .then(response => response.json())
        .then(response => {
          for (const binding of response.results.bindings) {
            for (const variable of binding) {
              binding[variable] = binding[variable].value
            }

            input.next(BindingBase.fromObject(binding))
          }

          input.complete()
        })
        .catch(error => {
          input.error(error)
        })
    })
  }
}

module.exports = EndpointGraph

/*
dataset.setGraphFactory(iri => {
  return new EndpointGraph(iri)
})
*/

For one thing though, formatQuery() results in queries like this one:

SELECT * WHERE {
  ?rhea http://www.w3.org/2000/01/rdf-schema#subClassOf http://rdf.rhea-db.org/Reaction .
  ?rhea http://rdf.rhea-db.org/ec http://purl.uniprot.org/enzyme/1.17.4.1  
}

This isn't correct as the URIs should have angle brackets. Also, it seems a bit inefficient as it would be a HTTP request per value of ?protein (http://purl.uniprot.org/enzyme/1.17.4.1 in this case). I could make a VALUES but that would mean collecting the BGPs manually, I think. I started using this framework yesterday though so I may be missing something.

Callidon commented 3 years ago

Hi

First, thank you for using our framework, it always a pleasure to see people hopping on it :)

To answer your questions: sparql-engine is at core a framework for building SPARQL engines, but not a ready-to-use SPARQL engine. It tries to deliver as many functionalities as possible without the need for heavy configuration, so that's you only have to implement a single class to starts executing queries on your custom backend.

Consequently, when it comes to your scenario with a federation of SPARQL endpoints, it's normal that it will not execute SPARQL queries optimally, because the engine tries to be generic as possible. In this situation, you will need to start exploiting the framework in a more advanced way.

Finally, for the issue of building your SPARQL queries, I suggest you take a look at the rdf-terms.js package, which contains many functions for working with RDF terms in string representation. In sparql-engine, we only work with strings for RDF terms, because of many legacy decisions made back in the days. There was some discussion about migrating to the RDF.js standard, but it involves a huge rework of almost the entire project. Since the whole team behind the framework is very busy right now, it might not be for the near future, unfortunately 😞

And just for the fun, you can implement a find() function for SPARQL endpoints, using a SPARQL CONSTRUCT instead of a SELECT query. I've put an example below. Of course, overriding the evalBGP function is way more optimized, but it's still fun to know that it's possible 😉

CONSTRUCT { ?rhea <http://rdf.rhea-db.org/ec> <http://purl.uniprot.org/enzyme/1.17.4.1> }
WHERE {  ?rhea <http://rdf.rhea-db.org/ec> <http://purl.uniprot.org/enzyme/1.17.4.1> }
larsgw commented 3 years ago

Thank you very much for this detailed explanation! I just realised I never replied before now, sorry.