Open c-martinez opened 7 years ago
There is a trade off between number of results and the size of the data search query is currently limited to 500.
We (together with @biktorrr) have discussed this and concluded that we need an external document relevance ranking mechanism (e.g. TFIDF). This is not directly possible in SPARQL, but is maybe an option to do this using ElasticSearch via SPARQL SERVICE directive as done in this example.
Also relevant to this discussion: https://docs.google.com/document/d/1GSd9ym_50jCX_nSpQ4HWD_-WGb8EKqAPw-llIwwdoF8/edit#
Possibly also an interesting reference for this issue: http://dl.acm.org/citation.cfm?id=2591350
Also of interest: https://blog.blazegraph.com/?tag=solr (using solr and blazegraph, instead of ES and cliopatria)
Proposed solution:
With this setup, a query like this can be run:
SELECT * WHERE
{
SERVICE <http://blazegraph-server:9999/blazegraph/namespace/kb/sparql>
{
?kwDoc <http://www.bigdata.com/rdf/fts#search> "x" .
?kwDoc <http://www.bigdata.com/rdf/fts#endpoint> "http://solr-server:8983/solr/db/select" .
?kwDoc <http://www.bigdata.com/rdf/fts#params> "fl=id,score,name" .
?kwDoc <http://www.bigdata.com/rdf/fts#scoreField> "score" .
?kwDoc <http://www.bigdata.com/rdf/fts#score> ?score .
?kwDoc <http://www.bigdata.com/rdf/fts#snippetField> "name" .
?kwDoc <http://www.bigdata.com/rdf/fts#snippet> ?name .
}
}
This will return name of documents, id's and scores.
Liliana: We are evaluating with Carlos and Sabrina the retrieval of entities and found out that because it only gives back 500 results, it is important to add a way of filtering in the qurey box per type of entity, otherwise, you get results with 500 allocated in just one entity and the other entity types with 0 hits. Example: search for Amsterdam, and you see you get 500 events and 0 in the other types. But a user may like to search per Amsterdam as location (entity type) from the query box, and then you see results more distributed across the other types of entities