Closed berger-n closed 4 years ago
(whereas "Prajñāpāramitā Sūtra"@sa-x-iast works)
The query actually sent to Fuseki is as follows:
CONSTRUCT
{
?TT rdf:type ?type .
?TT ?pp ?lit .
?TT skos:prefLabel ?l .
?TT tmp:matchScore ?sc .
?TT tmp:prefLabelMatch ?lit1 .
?TT tmp:altLabelMatch ?lit2 .
?TT :workType ?wT .
?TT :workPartType ?wPT .
?TT :workHasPart ?wHP .
?TT :workExpressionOf ?wE .
?TT tmp:prefLabelExpressionOf ?litE .
?TT ?wE ?litE .
?TT :workHasExpression ?wH .
?TT tmp:prefLabelHasExpression ?litH .
?TT ?wH ?litH .
?TT ?eventType ?lit .
?TT tmp:creatorLabel ?litC .
?TT tmp:creatorRole ?roleLabel .
?TT owl:sameAs ?same .
?TT ?same ?sameL .
?TT owl:sameAs ?sameBDRC .
?TT ?sameBDRC ?sameBDRClabel .
?TT rdf:type ?sameT .
?TT adm:canonicalHtml ?sameCanonHTML .
?TT adm:canonicalHtml ?canonHTML .
?TT skos:prefLabel ?sameL .
?TT skos:prefLabel ?sameBDRClabel .
?TT adm:status ?status .
}
WHERE
{ { ( ?s ?sc ?lit )
text:query ( rdfs:label "\"Prajñāpāramitā\""@sa-x-iast 1000 "highlight:" )
{ ?TT ?anyprop ?anypropobj .
?anypropobj :eventWho ?s ;
rdf:type ?eventType
}
UNION
{ ?TT ?p ?s
FILTER ( ! EXISTS { ?TT :eventWho ?anywho } )
?s rdf:type ?pp
}
}
UNION
{ ( ?s ?sc ?litC )
text:query ( skos:prefLabel "\"Prajñāpāramitā\""@sa-x-iast 1000 "highlight:" ) .
?TT :creator ?agent .
?agent :agent ?s ;
:role ?creaRole .
?creaRole skos:prefLabel ?roleLabel
}
UNION
{ ( ?TT ?sc ?lit2 )
text:query ( skos:altLabel "\"Prajñāpāramitā\""@sa-x-iast 1000 "highlight:" )
}
UNION
{ ( ?TT ?sc ?lit1 )
text:query ( skos:prefLabel "\"Prajñāpāramitā\""@sa-x-iast 1000 "highlight:" )
}
?TT rdf:type ?type
OPTIONAL
{ ?TT skos:prefLabel ?l }
OPTIONAL
{ ?TT adm:canonicalHtml ?canonHTML }
OPTIONAL
{ ?TT (:workPartOf)* ?res .
?adm adm:adminAbout ?res ;
adm:status ?status
}
OPTIONAL
{ ?TT (owl:sameAs)+ ?same .
?TT rdf:type ?sameT .
?sameT (rdfs:subClassOf)+ :Entity
}
OPTIONAL
{ ?TT (owl:sameAs)+ ?same .
?TT rdf:type ?sameT .
?sameT (rdfs:subClassOf)+ :Entity .
?same adm:canonicalHtml ?sameCanonHTML
}
OPTIONAL
{ ?TT (owl:sameAs)+ ?same .
?TT rdf:type ?sameT .
?sameT (rdfs:subClassOf)+ :Entity
{ ?same skos:prefLabel ?sameL }
UNION
{ ?same foaf:name ?sameL }
}
OPTIONAL
{ ?TT (owl:sameAs)+ ?same .
?TT rdf:type ?sameT .
?sameT (rdfs:subClassOf)+ :Entity
{ ?same skos:prefLabel ?sameL }
UNION
{ ?same foaf:name ?sameL }
?same adm:canonicalHtml ?sameCanonHTML
}
OPTIONAL
{ ?sameBDRC (owl:sameAs)+ ?TT .
?sameBDRC (owl:sameAs)+ ?same .
?sameBDRC rdf:type ?sameT .
?sameT (rdfs:subClassOf)+ :Entity
}
OPTIONAL
{ ?sameBDRC (owl:sameAs)+ ?TT .
?sameBDRC (owl:sameAs)+ ?same .
?sameBDRC rdf:type ?sameT .
?sameT (rdfs:subClassOf)+ :Entity .
?same adm:canonicalHtml ?sameCanonHTML
}
OPTIONAL
{ ?sameBDRC (owl:sameAs)+ ?TT .
?sameBDRC (owl:sameAs)+ ?same .
?sameBDRC rdf:type ?sameT .
?sameT (rdfs:subClassOf)+ :Entity
{ ?sameBDRC skos:prefLabel ?sameBDRClabel }
UNION
{ ?sameBDRC foaf:name ?sameBDRClabel }
{ ?same skos:prefLabel ?sameL }
UNION
{ ?same foaf:name ?sameL }
}
OPTIONAL
{ ?sameBDRC (owl:sameAs)+ ?TT .
?sameBDRC (owl:sameAs)+ ?same .
?sameBDRC rdf:type ?sameT .
?sameT (rdfs:subClassOf)+ :Entity
{ ?sameBDRC skos:prefLabel ?sameBDRClabel }
UNION
{ ?sameBDRC foaf:name ?sameBDRClabel }
{ ?same skos:prefLabel ?sameL }
UNION
{ ?same foaf:name ?sameL }
?same adm:canonicalHtml ?sameCanonHTML
}
OPTIONAL
{ ?TT :workPartType ?wPT }
OPTIONAL
{ ?TT :workHasPart ?wHP }
OPTIONAL
{ ?TT :workExpressionOf ?wE .
?wE skos:prefLabel ?litE
}
OPTIONAL
{ ?TT :workHasExpression ?wH .
?wH skos:prefLabel ?litH
}
}
ORDER BY DESC(?sc)
The issue is coming from its execution by Fuseki. There are first several warnings in the logs:
[2019-11-20 01:19:28] SkrtWordTokenizer WARN finalOffset incorrect: 27-26 in stotra
[2019-11-20 01:19:28] SkrtWordTokenizer WARN finalOffset incorrect: 27-26 in stotra
[2019-11-20 01:19:28] SkrtWordTokenizer WARN finalOffset incorrect: 23-22 in saM
[2019-11-20 01:19:28] SkrtWordTokenizer WARN finalOffset incorrect: 23-22 in saM
[2019-11-20 01:19:28] SkrtWordTokenizer WARN finalOffset incorrect: 52-51 in nirdeSa
[2019-11-20 01:19:28] SkrtWordTokenizer WARN finalOffset incorrect: 52-51 in nirdeSa
[2019-11-20 01:19:28] SkrtWordTokenizer WARN finalOffset incorrect: 52-51 in nirdeSa
[2019-11-20 01:19:28] SkrtWordTokenizer WARN finalOffset incorrect: 52-51 in nirdeSa
[2019-11-20 01:19:28] SkrtWordTokenizer WARN finalOffset incorrect: 63-62 in
[2019-11-20 01:19:28] SkrtWordTokenizer WARN finalOffset incorrect: 41-40 in e
[2019-11-20 01:19:28] SkrtWordTokenizer WARN finalOffset incorrect: 41-40 in e
[2019-11-20 01:19:28] SkrtWordTokenizer WARN finalOffset incorrect: 43-42 in e
[2019-11-20 01:19:28] SkrtWordTokenizer WARN finalOffset incorrect: 43-42 in e
[2019-11-20 01:19:28] SkrtWordTokenizer WARN finalOffset incorrect: 93-92 in ri
[2019-11-20 01:19:28] SkrtWordTokenizer WARN finalOffset incorrect: 93-92 in ri
[2019-11-20 01:19:28] SkrtWordTokenizer WARN finalOffset incorrect: 93-92 in ri
[2019-11-20 01:19:28] SkrtWordTokenizer WARN finalOffset incorrect: 93-92 in ri
[2019-11-20 01:19:28] SkrtWordTokenizer WARN finalOffset incorrect: 27-26 in stotra
[2019-11-20 01:19:28] SkrtWordTokenizer WARN finalOffset incorrect: 27-26 in stotra
[2019-11-20 01:19:28] SkrtWordTokenizer WARN finalOffset incorrect: 23-22 in saM
[2019-11-20 01:19:28] SkrtWordTokenizer WARN finalOffset incorrect: 23-22 in saM
[2019-11-20 01:19:28] SkrtWordTokenizer WARN finalOffset incorrect: 52-51 in nirdeSa
[2019-11-20 01:19:28] SkrtWordTokenizer WARN finalOffset incorrect: 52-51 in nirdeSa
[2019-11-20 01:19:28] SkrtWordTokenizer WARN finalOffset incorrect: 52-51 in nirdeSa
[2019-11-20 01:19:28] SkrtWordTokenizer WARN finalOffset incorrect: 52-51 in nirdeSa
[2019-11-20 01:19:28] SkrtWordTokenizer WARN finalOffset incorrect: 41-40 in sto
[2019-11-20 01:19:28] SkrtWordTokenizer WARN finalOffset incorrect: 41-40 in sto
[2019-11-20 01:19:28] SkrtWordTokenizer WARN finalOffset incorrect: 24-20 in prajYA
and then the following exception concerning the "highlight" feature of the search:
Caused by: org.apache.lucene.search.highlight.InvalidTokenOffsetsException: Token UsUasUtra exceeds length of provided text sized 57
at org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:231)
at org.apache.jena.query.text.TextIndexLucene.highlightResults(TextIndexLucene.java:679)
at org.apache.jena.query.text.TextIndexLucene.query$(TextIndexLucene.java:807)
at org.apache.jena.query.text.TextIndexLucene.query(TextIndexLucene.java:513)
... 203 more
@xristy What do you think ?
@MarcAgate The problem appears to be in the lucene-sa analyzer.
It will be best to create a simple one-line text:query
that exhibits the problem rather than having to wade through the massive query.
I doubt that that I can take the problem further since I'm not versed in the ways the analyzer internalals.
the main problems is the analyzer config, it's been changed in buda-base for a while (same with Chinese), but that would require a whole redeployment...
thanks! it works now
http://purl.bdrc.io/lib/rootSearchGraph?LG_NAME=sa-x-iast&I_LIM=500&L_NAME=%22Praj%C3%B1%C4%81p%C4%81ramit%C4%81%22&format=json