Open jamiefeiss opened 1 year ago
Testing with just the inner SELECT query now, the main issue seems to be that this query searches across all triples. Also, this weighted regex is significantly faster (0.035s vs 29.189s in Fuseki) if we implement something similar to the "skosWeighted" search method - https://github.com/RDFLib/prez/blob/main/prez/reference_data/search_methods/search_skos_weighted.ttl . See below:
SELECT ?search_result_uri ?predicate ?match (SUM(?w) AS ?weight) ?hashID
WHERE {
?search_result_uri ?predicate ?match .
?search_result_uri <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2004/02/skos/core#Concept> .
?search_result_uri <http://www.w3.org/2004/02/skos/core#inScheme> <https://linked.data.gov.au/def/data-access-rights> .
BIND(URI(CONCAT("urn:hash:", SHA256(CONCAT(STR(?search_result_uri), STR(?predicate), STR(?match))))) AS ?hashID)
{
?search_result_uri ?predicate ?match .
BIND (50 AS ?w)
FILTER (REGEX(?match, "^open$", "i"))
} UNION {
?search_result_uri ?predicate ?match .
BIND (20 AS ?w)
FILTER (REGEX(?match, "^open", "i"))
} UNION {
?search_result_uri ?predicate ?match .
BIND (10 AS ?w)
FILTER (REGEX(?match, "open", "i"))
}
} GROUP BY ?search_result_uri ?predicate ?match ?hashID ORDER BY DESC(?weight) LIMIT 10
Since we'll probably only be searching across labels & descriptions, and returning objects that have endpoints in Prez, we could restrict the predicates that are matched and the base classes of the results to further optimise the query.
Looks like it's the query structure. Lets see if we can add back in the CONSTRUCT to your performant REGEX above.
For context as well, FTS query below.
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX ex: <http://www.example.org/resources#>
PREFIX text: <http://jena.apache.org/text#>
PREFIX sdo: <https://schema.org/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?MatchURI (COALESCE(?prop_label, ?MatchProp) AS ?MatchProperty) ?MatchTerm ?SearchTerm
{
VALUES ?SearchTerm {"*open*"
}
(?MatchURI ?Weight ?MatchTerm ?graph ?MatchProp) text:query ( ex:NameProps ?SearchTerm) .
OPTIONAL {
?MatchURI skos:prefLabel|rdfs:label|dcterms:title|sdo:name ?match_label .
}
OPTIONAL {
?MatchProp skos:prefLabel|rdfs:label|dcterms:title|sdo:name ?prop_label .
}
}
How does this look?
PREFIX prez: <https://prez.dev/>
CONSTRUCT {
?hashID a prez:SearchResult ;
prez:searchResultWeight ?w ;
prez:searchResultPredicate ?predicate ;
prez:searchResultMatch ?match ;
prez:searchResultURI ?search_result_uri .
?search_result_uri ?p ?o1 .
?o1 ?p2 ?o2 .
?o2 ?p3 ?o3 .
}
WHERE {
{
SELECT ?search_result_uri ?predicate ?match ?w ?hashID
WHERE {
?search_result_uri ?predicate ?match .
?search_result_uri <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2004/02/skos/core#Concept> .
?search_result_uri <http://www.w3.org/2004/02/skos/core#inScheme> <https://linked.data.gov.au/def/data-access-rights> .
BIND(URI(CONCAT("urn:hash:", SHA256(CONCAT(STR(?search_result_uri), STR(?predicate), STR(?match))))) AS ?hashID)
{
?search_result_uri ?predicate ?match .
BIND (50 AS ?w)
FILTER (REGEX(?match, "^open$", "i"))
} UNION {
?search_result_uri ?predicate ?match .
BIND (20 AS ?w)
FILTER (REGEX(?match, "^open", "i"))
} UNION {
?search_result_uri ?predicate ?match .
BIND (10 AS ?w)
FILTER (REGEX(?match, "open", "i"))
}
}
GROUP BY ?search_result_uri ?predicate ?match ?hashID ?w
LIMIT 10
}
?search_result_uri ?p ?o1 .
OPTIONAL {
FILTER(ISBLANK(?o1))
?o1 ?p2 ?o2 .
OPTIONAL {
FILTER(ISBLANK(?o2))
?o2 ?p3 ?o3 .
}
}
}
Looks good, nice and fast at about 0.035s.
Not aggregating just means you'll get duplicate results in the case where a result satisfies multiples matches.
What do you think of restricting the matched predicate to labels & descriptions? Description matching could be worth less too. Also what do you think of restricting the base class to classes Prez supports?
What do you think of restricting the matched predicate to labels & descriptions?
This would be a closed profile with no properties defined. You'll then get labels/descriptions when the annotations are added. Profiles changes coming soon ..
Description matching could be worth less too.
Sounds good - any issue adding LCASE back in too for "exact" match?
{
?search_result_uri ?predicate ?match .
BIND (100 AS ?w)
FILTER (LCASE(?match) = "open")
}
UNION
...
Also what do you think of restricting the base class to classes Prez supports?
Ideally I think prez could display whatever information about whatever object is found, perhaps on a generic page if there isn't a suitable endpoint
David to:
Resolved in #149
Testing the "default" regex search method takes over 30s against the IDN triplestore for the following query:
http://localhost:8000/search?term=open&method=default&limit=10&focus-to-filter[rdf:type]=http%3A%2F%2Fwww.w3.org%2F2004%2F02%2Fskos%2Fcore%23Concept&focus-to-filter[skos:inScheme]=https%3A%2F%2Flinked.data.gov.au%2Fdef%2Fdata-access-rights