buda-base / lds-pdi

http://purl.bdrc.io BDRC Linked Data Server
Apache License 2.0
2 stars 0 forks source link

fuseki error in text search #137

Closed eroux closed 5 years ago

eroux commented 5 years ago

When running

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX bdr: <http://purl.bdrc.io/resource/>
PREFIX : <http://purl.bdrc.io/ontology/core/>
PREFIX bd: <http://www.bigdata.com/rdf#>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX adm: <http://purl.bdrc.io/ontology/admin/>
PREFIX bdo: <http://purl.bdrc.io/ontology/core/>
prefix bdg:   <http://purl.bdrc.io/graph/>

PREFIX tmp: <http://purl.bdrc.io/ontology/tmp/>
PREFIX text: <http://jena.apache.org/text#>

construct{

  ?chunk a :EtextChunk .
  ?chunk :eTextHasChunk ?lit .
  ?chunk bdo:seqNum ?seqNum . 
  ?chunk bdo:sliceStartChar ?startChar .
  ?chunk bdo:sliceEndChar ?endChar .
  ?chunk tmp:forEtext ?etext .
  ?chunk tmp:forWork ?s .
  ?chunk tmp:workLabel ?workLabel .
  ?chunk bdo:creatorMainAuthor ?author.
  ?chunk tmp:authorName ?author_name .
  ?chunk tmp:etextAbout ?about .
  ?chunk tmp:etextGenre ?genre .
  ?chunk bdo:eTextTitle ?etextTitle .
  ?chunk bdo:eTextVolumeIndex ?volIndex .
  ?chunk bdo:eTextIsVolume ?isVolume .
}
where
{
  (?chunk ?score ?lit) text:query ( :chunkContents "de bzhin gshegs"@bo-x-ewts 50 "highlight:") .
  ?etext :eTextHasChunk ?chunk .
  ?chunk bdo:seqNum ?seqNum .
  ?chunk bdo:sliceStartChar ?startChar .
  ?chunk bdo:sliceEndChar ?endChar .
  ?s :workHasItemEtext/bdo:itemHasVolume/bdo:volumeHasEtext/bdo:eTextResource  ?etext.
  ?etext bdo:eTextTitle ?etextTitle .
  optional{?etext bdo:eTextIsVolume ?isVolume .}
  optional{?etext bdo:eTextVolumeIndex ?volIndex .}
  optional{?s bdo:workIsAbout ?about .
    ?about a :Topic}.
  optional {?s bdo:workGenre ?workgenre .
  ?workgenre a :Topic} 
  optional{?s skos:prefLabel ?workLabel .}
  Optional{ ?s bdo:creatorMainAuthor ?author .
  ?author skos:prefLabel ?author_name }.

}
order by desc(?score)

in the rfc011rw dataset, I'm getting an error message

Message: check(?lit, null): null node value

with no other helpful context, @xristy can you take a look?

xristy commented 5 years ago

There's an error in the text:query. The query string needs to be quoted so that it is treated as a phrase rather than a 3-way or:

(?chunk ?score ?lit) text:query ( :chunkContents "\"de bzhin gshegs\""@bo-x-ewts 50 "highlight:") .

why it blows up otherwise requires more effort, but you should be able to move forward with the above change in the query

xristy commented 5 years ago

Here's some trace data for this problem. A successful query with a quoted search string:

LDSPDI_ISSUE-137-data-part-01.txt

and here's the failure data:

LDSPDI_ISSUE-137-data-part-03.txt

It is evident that the problem is that the QueryParser generates gibberish from the unquoted search string that essentially ends up violating the Lucene syntax.

It bears some more investigation but this is enough for now I think.