Callidon / sparql-engine

🚂 A framework for building SPARQL query engines in Javascript/Typescript
https://callidon.github.io/sparql-engine
MIT License
101 stars 14 forks source link

Add full text search feature #38

Closed Callidon closed 4 years ago

Callidon commented 4 years ago

This PR adds supports for approximate string matching using full text search queries. It follows an approach similar to BlazeGraph and defines several magic predicates that are given special meaning, and when encountered in a SPARQL query, they are interpreted as configuration parameters for a full text search query.

The simplest way to integrate a full text search into a SPARQL query is to use the magic predicate ses:search inside of a SPARQL join group. In the following query, this predicate is used to search for the keywords neil and gaiman in the values binded to the ?o position of the triple pattern.

PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX ses: <https://callidon.github.io/sparql-engine/search#>
SELECT * WHERE {
  ?s foaf:knows ?o .
  ?o ses:search “neil gaiman” .
}

In a way, full text search queries allows users to express more complex SPARQL filters that performs approximate string matching over RDF terms. Each result is annotated with a relevance score (how much it matches the keywords, higher is better) and a rank (they represent the descending order of relevance scores). These two values are not binded by default into the query results, but you can use magic predicates to get access to them (see below). Note that the meaning of relevance scores is specific to the implementation of the full text search.

The full list of magic predicates that you can use in a full text search query is:

Below is a more complete example, that use most of these keywords to customize the full text search.

PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX ses: <https://callidon.github.io/sparql-engine/search#>
SELECT ?s ?o ?score ?rank WHERE {
  ?s foaf:knows ?o .
  ?o ses:search “neil gaiman” .
  ?o ses:minRelevance “0.25” .
  ?o ses:maxRank “1000” .
  ?o ses:relevance ?score .
  ?o ses:rank ?rank .
  ?o ses:matchAllTerms “true” .
}