Closed filak closed 9 months ago
Could you provide a complete reproducible example please? We're missing the sample input data that you've used to build the text index that exhibits this behaviour
Yes, sure.
I have been trying to use a default query using mt:defQuery - not searching in the note and multiple other fields and a second query using mt:includeNotes - including the note field and possibly other fields.
I have simplified my use case.
Test data - drinks.nt
<http://id.example.test/1> <http://www.w3.org/2000/01/rdf-schema#label> "beer"@en .
<http://id.example.test/1> <http://id.example.test/vocab/#altLabel> "pint"@en .
<http://id.example.test/2> <http://id.example.test/vocab/#alt_label> "ale"@en .
<http://id.example.test/1> <http://id.example.test/mx/#alt_label> "pivečko"@cs .
<http://id.example.test/1> <http://id.example.test/vocab/#note> "Booze is a pleasure"@en .
<http://id.example.test/1> <http://id.example.test/vocab/#note> "Chlast je slast"@cs .
<http://id.example.test/2> <http://www.w3.org/2000/01/rdf-schema#label> "wine"@en .
<http://id.example.test/2> <http://id.example.test/vocab/#altLabel> "champagne"@en .
<http://id.example.test/2> <http://id.example.test/vocab/#alt_label> "burgundy"@en .
<http://id.example.test/2> <http://id.example.test/mx/#alt_label> "víno"@cs .
<http://id.example.test/2> <http://id.example.test/vocab/#note> "Red or white"@en .
<http://id.example.test/2> <http://id.example.test/vocab/#note> "Červené či bílé"@cs .
The config - drinks.ttl
@prefix : <http://localhost/jena_example/#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix tdb2: <http://jena.apache.org/2016/tdb#> .
@prefix ja: <http://jena.hpl.hp.com/2005/11/Assembler#> .
@prefix text: <http://jena.apache.org/text#> .
@prefix fuseki: <http://jena.apache.org/fuseki#> .
@prefix mt: <http://id.example.test/vocab/#> .
@prefix mx: <http://id.example.test/mx/#> .
## Initialize text query
[] ja:loadClass "org.apache.jena.query.text.TextQuery" .
# A TextDataset is a regular dataset with a text index.
text:TextDataset rdfs:subClassOf ja:RDFDataset .
# Lucene index
text:TextIndexLucene rdfs:subClassOf text:TextIndex .
# Elasticsearch index
text:TextIndexES rdfs:subClassOf text:TextIndex .
## ---------------------------------------------------------------
## This URI must be fixed - it's used to assemble the text dataset.
:text_dataset
a text:TextDataset ;
text:dataset <#dataset> ;
text:index <#indexLucene> ;
.
# A TDB dataset used for RDF storage
<#dataset>
a tdb2:DatasetTDB2 ;
tdb2:location "d:/Data/jena/databases/drinks" ;
.
# Text index description
<#indexLucene>
a text:TextIndexLucene ;
text:directory "d:/Data/jena/indexes/drinks" ;
text:entityMap <#entMap> ;
text:storeValues true ;
text:analyzer [
a text:ConfigurableAnalyzer ;
text:tokenizer text:StandardTokenizer ;
text:filters (text:ASCIIFoldingFilter text:LowerCaseFilter)
] ;
text:queryParser text:AnalyzingQueryParser ;
text:multilingualSupport true ;
text:propLists (
[ text:propListProp mt:defQuery ;
text:props (
rdfs:label
mt:altLabel
mt:alt_label
) ;
]
[ text:propListProp mt:includeNotes ;
text:props (
rdfs:label
mt:altLabel
mt:alt_label
mt:note
) ;
]
[ text:propListProp mt:testQuery ;
text:props (
rdfs:label
mx:alt_label
) ;
]
) ;
.
<#entMap>
a text:EntityMap ;
text:defaultField "ftext" ;
text:entityField "uri" ;
text:uidField "uid" ;
text:langField "lang" ;
text:graphField "graph" ;
text:map (
[ text:field "ftext" ; text:predicate rdfs:label ]
[ text:field "ftext" ; text:predicate mt:altLabel ]
[ text:field "ftext" ; text:predicate mt:alt_label ]
[ text:field "ftext" ; text:predicate mt:note ]
) .
<#service_text_tdb>
a fuseki:Service ;
rdfs:label "Drinks TEST" ;
fuseki:name "drinks" ;
fuseki:serviceQuery "query" ;
fuseki:serviceQuery "sparql" ;
fuseki:serviceUpdate "update" ;
fuseki:serviceUpload "upload" ;
fuseki:serviceReadGraphStore "get" ;
fuseki:serviceReadWriteGraphStore "data" ;
fuseki:dataset :text_dataset ;
.
Load to Jena
tdb2_tdbloader --loc %FUSEKI_BASE%/databases/drinks _imports/drinks.nt
Index
java -cp %FUSEKI_HOME%/fuseki-server.jar jena.textindexer --desc=configuration/drinks.ttl
The queries at http://localhost:3030/#/dataset/drinks/query
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX text: <http://jena.apache.org/text#>
PREFIX mt: <http://id.example.test/vocab/#>
# Query #1
select * where {
?s text:query ("beer white")
}
=> 2 hits - OK
# Query #2
select * where {
?s text:query (mt:includeNotes "white")
}
=> 1 hit - OK
# Query #3
select * where {
?s text:query (mt:defQuery "white")
}
=> 1 hit - but should it not be 0 ? Because the "white" string is only present in the note field ?
Observation 1:
NOT TRUE - see https://github.com/apache/jena/issues/2094#issuecomment-1831912164 - When a field/predicate name in the text:props definition contains underscore _ - ie. alt_label
[ text:propListProp mt:testQuery ;
text:props (
rdfs:label
mt:altLabel
mx:alt_label
) ;
]
Any updates on this @rvesse ? I have tried to locate in the code what might be happening with the underscored fields but so far I failed.
The underscore is not a reserved char in Lucene.
Any updates on this @rvesse ? I have tried to locate in the code what might be happening with the underscored fields but so far I failed.
The underscore is not a reserved char in Lucene.
Sorry @filak I have no idea personally, not an area of the code base I'm familiar with.
I'd hoped by you providing more details some of our Jena Text/Lucene experts like @OyvindLGjesdal might be able to take a look and comment on what's going on?
Hi @filak and thanks for the precise examples, and thanks for the ping.
I have some problems with replicating the issues described.
One thing I notice in the test data is that the mx namespace isn't mentioned. What is the prefix mx: in mx:alt_label, is it just a typo in the example?
I copied one of the existing tests using propLists to recreate the errors, and get the 3 expected results back when using the test-data, and no items back when I tried to replicate the other example.
I first got the warning message
23:03:36 WARN TextQueryPF :: Predicate not indexed: http://id.example.test/vocab/#alt_label
23:03:36 WARN TextQueryPF :: objectToStruct: props are not indexed [http://www.w3.org/2004/02/skos/core#prefLabel, http://www.w3.org/2004/02/skos/core#altLabel, http://www.w3.org/2000/01/rdf-schema#label, http://id.example.test/vocab/#alt_label]
during running the test, and had to add it to the text map, and rerun the test without a warning, to get the expected result back.
" text:map (",
" [ text:field \"label\" ; text:predicate rdfs:label ; text:noIndex true ]",
" [ text:field \"altLabel\" ; text:predicate skos:altLabel ]",
+ " [ text:field \"alt_Label\" ; text:predicate mt:alt_label ]",
" [ text:field \"prefLabel\" ; text:predicate skos:prefLabel ]",
" [ text:field \"comment\" ; text:predicate rdfs:comment ]",
" [ text:field \"workAuthorshipStatement\" ; text:predicate spec:workAuthorshipStatement ]",
" [ text:field \"workEditionStatement\" ; text:predicate spec:workEditionStatement ]",
" [ text:field \"workColophon\" ; text:predicate spec:workColophon ]",
" ) ."
Was the props are not indexed step above silent when running?
Not sure what happens with the second step, but one thing I thought of from the example above, was that maybe there was leftover documents in the lucene folder, if it wasn't deleted during debugging.
I think that lucene deletions on documents aren't part of running the java command for reindexing. My information might be outdated or wrong on this, but we still delete the lucene folder, before running indexing on an offline database, during CI-jobs.
See the two tests which pass at https://github.com/apache/jena/compare/main...OyvindLGjesdal:jena:debug-text-prop-not-working-in-some-cases
I didn't replicate your configuration in the test, so it could also be other stuff that breaks, but hope this helps.
Could it be that it resolves to the the same lucene text:field "fulltext"
from the textmap, when it resolves the propListProp ? The tests which I copied from, used different field names, while your example adds them all to to the same fulltext
field. This sounds more likelely, and is maybe a bug? I'll try tonight and see if replicating the example on the text:map
, results in the same behavior.
Thank you for looking into this @OyvindLGjesdal
I have updated the testing data and the config.
Hmmm, so going back to my Observation 1...
The undescore issue is just a red herring - I apologize for the mistake.
I did forgot to include a field in the text:map() - mx:alt_label
[ text:propListProp mt:testQuery ;
text:props (
rdfs:label
mt:altLabel
mx:alt_label
) ;
<#entMap>
a text:EntityMap ;
text:defaultField "ftext" ;
text:entityField "uri" ;
text:uidField "uid" ;
text:langField "lang" ;
text:graphField "graph" ;
text:map (
[ text:field "ftext" ; text:predicate rdfs:label ]
[ text:field "ftext" ; text:predicate mt:altLabel ]
[ text:field "ftext" ; text:predicate mt:note ]
) .
So any query
select * where {
?s text:query (mt:testQuery "*")
}
always returning 0 hits.
Is this correct behaviour ? There is a missing field (mx:alt_label) in the props but also an existing field (mt:altLabel) - so maybe the query should return some hits in this case ?
Anyway a prop field missing in the mapping seems to break things so it shall be avoided.
The other problem - the query
# Query #3
select * where {
?s text:query (mt:defQuery "white")
}
returning 1 hit.
I think this should return 0 hits - because the term white is contained in the mt:note field and this field is not included in text:props
[ text:propListProp mt:defQuery ;
text:props (
rdfs:label
mt:altLabel
) ;
]
The other problem - the query
# Query #3 select * where { ?s text:query (mt:defQuery "white") }
returning 1 hit.
I think this should return 0 hits - because the term white is contained in the mt:note field and this field is not included in text:props
[ text:propListProp mt:defQuery ; text:props ( rdfs:label mt:altLabel ) ; ]
I think this one is caused by the issue identified in https://github.com/apache/jena/issues/2094#issuecomment-1831510414, you map several properties to the same field in the underlying Lucene index. Since the index doesn't store what property text originated from in the index a query on any of those properties that share the same Lucene field can thus return documents that matched based on any of the original input properties textual values.
Not sure whether this a bug or not. It appears to be a side effect of the design choices of how the data is indexed into Lucene. This should maybe be flagged as a configuration and/or query time warning.
To make the query behave as you expect either requires your configuration to change to separate the properties into different fields, or the jena-text
code to change how it currently indexes and queries data (which would be a breaking change AFAICT)
Maybe the docs need to be more specific about how to do the mapping...
I had started initially with the catch-all ftext field - ie
<#entMap>
...
text:map (
[ text:field "ftext" ; text:predicate ...
[ text:field "ftext" ; text:predicate ...
...
and a queries like this
?s text:query ("whatever")
Then I realized I needed more control over the searching and I started trying to use propLists.
So should I map all the fields separately in the text:map and mix them in the propLists as needed?
What might make sense to me:
text:defaultField "labels" ;
...
text:map (
[ text:field "labels" ; text:predicate rdfs:label ]
[ text:field "labels" ; text:predicate mt:altLabel ]
[ text:field "labels" ; text:predicate mt:alt_label ]
[ text:field "labels" ; text:predicate mx:alt_label ]
[ text:field "notes" ; text:predicate mx:note ]
[ text:field "notes" ; text:predicate mt:note2 ]
[ text:field "notes" ; text:predicate mt:note1 ]
text:propLists (
[ text:propListProp mt:defQuery ;
text:props (
labels
) ;
]
[ text:propListProp mt:includeNotes ;
text:props (
labels
notes
) ;
]
But I have no clue if that is feasible at all or what prefix I should use in this case.
I have modified the config:
text:propLists (
[ text:propListProp mt:defQuery ;
text:props (
rdfs:label
mt:altLabel
mt:alt_label
mx:alt_label
) ;
]
[ text:propListProp mt:includeNotes ;
text:props (
rdfs:label
mt:altLabel
mt:alt_label
mx:alt_label
mt:note
) ;
]
) ;
.
<#entMap>
a text:EntityMap ;
text:defaultField "ftext" ;
text:entityField "uri" ;
text:uidField "uid" ;
text:langField "lang" ;
text:graphField "graph" ;
text:map (
[ text:field "ftext" ; text:predicate rdfs:label ]
[ text:field "ftext" ; text:predicate mt:altLabel ]
[ text:field "ftext" ; text:predicate mt:alt_label ]
[ text:field "ftext" ; text:predicate mx:alt_label ]
[ text:field "note" ; text:predicate mt:note ]
) .
Now the queries work as expected !
?s text:query (mt:defQuery "white") => 0 hits
And this also works:
?s text:query ("beer white") => 1 hit - <http://id.example.test/1>
?s text:query ("white") => 0 hits
?s text:query (mt:includeNotes "white beer") => 2 hits (ID 1 + 2)
However these are weird:
?s text:query (mt:includeNotes "red booze") => 1 hit - <http://id.example.test/2> ??
?s text:query (mt:includeNotes "booze red") => 1 hit - <http://id.example.test/1> ??
However these are weird:
?s text:query (mt:includeNotes "red booze") => 1 hit - <http://id.example.test/2> ?? ?s text:query (mt:includeNotes "booze red") => 1 hit - <http://id.example.test/1> ??
Yeah those still look off. Seems like something odd may be happening since the order of terms in your query impacts the results returned, glancing at your sample data that query really should match both AFAICT
Could you try increasing your log level to TRACE
as looking at the jena-text
code it should give a lot of detail about the Lucene query being built at that level?
OK, I started Fuseki jar with --debug option and here is the log after running the query:
15:29:37 INFO Fuseki :: [5] Query =
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX text: <http://jena.apache.org/text#>
PREFIX mt: <http://id.example.test/vocab/#>
select * where {
?s text:query (mt:includeNotes "red booze")
}
15:29:37 TRACE TextQueryPF :: exec: ?s text:query (<http://id.example.test/vocab/#includeNotes> "red booze")
15:29:37 TRACE TextQueryPF :: objectToStruct: x.isURI(), prop: http://id.example.test/vocab/#includeNotes at idx: 0
15:29:37 TRACE TextQueryPF :: objectToStruct: PROPERTY at 0 IS http://id.example.test/vocab/#includeNotes WITH pList: [http://www.w3.org/2000/01/rdf-schema#label, http://id.example.test/vocab/#altLabel, http://id.example.test/vocab/#alt_label, http://id.example.test/mx/#alt_label, http://id.example.test/vocab/#note]
15:29:37 TRACE TextQueryPF :: prepareQuery with subject: ?s; params: ( properties: [http://www.w3.org/2000/01/rdf-schema#label, http://id.example.test/vocab/#altLabel, http://id.example.test/vocab/#alt_label, http://id.example.test/mx/#alt_label, http://id.example.test/vocab/#note]; query: red booze; limit: -1; lang: null; highlight: null )
15:29:37 DEBUG TextQueryPF :: Text query: red booze <urn:x-arq:DefaultGraphNode> (-1)
15:29:37 TRACE TextQueryPF :: Caching Text query: red booze with key: >>?s -1 [http://www.w3.org/2000/01/rdf-schema#label, http://id.example.test/vocab/#altLabel, http://id.example.test/vocab/#alt_label, http://id.example.test/mx/#alt_label, http://id.example.test/vocab/#note] red booze null urn:x-arq:DefaultGraphNode<< in cache: org.apache.jena.atlas.lib.cache.CacheCaffeine@2a457ab1
15:29:37 TRACE TextIndexLucene :: query$ PROCESSING LIST of properties: [http://www.w3.org/2000/01/rdf-schema#label, http://id.example.test/vocab/#altLabel, http://id.example.test/vocab/#alt_label, http://id.example.test/mx/#alt_label, http://id.example.test/vocab/#note]; Lucene queryString: ; textFields: [ftext, ftext, ftext, ftext, note]
15:29:37 TRACE TextIndexLucene :: query$ PROCESSED LIST of properties: [http://www.w3.org/2000/01/rdf-schema#label, http://id.example.test/vocab/#altLabel, http://id.example.test/vocab/#alt_label, http://id.example.test/mx/#alt_label, http://id.example.test/vocab/#note] with resulting qString: ftext:red booze ftext:red booze ftext:red booze ftext:red booze note:red booze
15:29:37 WARN TextIndexLucene :: Deprecated query parser type 'AnalyzingQueryParser'. Defaulting to standard QueryParser
15:29:37 DEBUG TextIndexLucene :: query$ with LIST: [http://www.w3.org/2000/01/rdf-schema#label, http://id.example.test/vocab/#altLabel, http://id.example.test/vocab/#alt_label, http://id.example.test/mx/#alt_label, http://id.example.test/vocab/#note]; INPUT qString: (ftext:red booze ftext:red booze ftext:red booze ftext:red booze note:red booze ) AND graph:urn\:x\-arq\:DefaultGraphNode; with queryParserType: AnalyzingQueryParser; parseQuery with PerFieldAnalyzerWrapper({lang=org.apache.lucene.analysis.core.KeywordAnalyzer@5e0c4f21, uri=org.apache.lucene.analysis.core.KeywordAnalyzer@2c18a3ea, graph=org.apache.lucene.analysis.core.KeywordAnalyzer@166c2c17}, default=MultilingualAnalyzer(default=org.apache.jena.query.text.analyzer.ConfigurableAnalyzer@1df5c7e3)) YIELDS: +(ftext:red ftext:booze ftext:red ftext:booze ftext:red ftext:booze ftext:red ftext:booze note:red ftext:booze) +graph:urn:x-arq:DefaultGraphNode; parsed query: +(ftext:red ftext:booze ftext:red ftext:booze ftext:red ftext:booze ftext:red ftext:booze note:red ftext:booze) +graph:urn:x-arq:DefaultGraphNode; limit: 10000
15:29:37 TRACE TextIndexLucene :: simpleResults[8]: fields: [ftext, ftext, ftext, ftext, note] doc: Document<stored,indexed,tokenized,indexOptions=DOCS<uri:http://id.example.test/2> stored,indexed,tokenized,indexOptions=DOCS<graph:urn:x-arq:DefaultGraphNode> stored,indexed,tokenized<note:Red or white> stored,indexed,tokenized,omitNorms,indexOptions=DOCS<lang:en> stored,indexed,tokenized,omitNorms,indexOptions=DOCS<uid:48bc17f6921b4efff3f082a027a3e2c11037e9262ab743ed174587619543f767>>
15:29:37 TRACE TextQueryPF :: resultsToQueryIterator CALLED with results: [TextHit{node=http://id.example.test/2 literal="Red or white"@en score=0.58286893 graph=urn:x-arq:DefaultGraphNode prop=http://id.example.test/vocab/#note}]
ftext:red booze ftext:red booze ftext:red booze ftext:red booze note:red booze
So that looks like a bug to me.
The generated Lucene query is not properly quoting the search string when applying it to each field. Per https://lucene.apache.org/core/9_8_0/queryparser/org/apache/lucene/queryparser/classic/package-summary.html#Fields this means the query is only searching for red
in the ftext
field and booze
in the default field, which does however happen to be ftext
judging by the parsed query:
parsed query: +(ftext:red ftext:booze ftext:red ftext:booze ftext:red ftext:booze ftext:red ftext:booze note:red ftext:booze) +graph:urn:x-arq:DefaultGraphNode; limit: 10000
This means that only the first word in your query gets queried in the note
field which is why the order of the terms in the query affects the results.
@OyvindLGjesdal does that look like a valid analysis to you?
It also looks like we generate duplicate query clauses when multiple properties map to the same Lucene field which might be unnecessary?
Although I'm not sure the fix is to just quote the search string because it could itself already be a complex query e.g. "red wine" OR "white beer"
which wouldn't work if we blindly surround with "
Maybe Field Grouping is the solution i.e.
ftext:(Red booze) note:(Red booze)
??
ftext:red booze ftext:red booze ftext:red booze ftext:red booze note:red booze
So that looks like a bug to me.
The generated Lucene query is not properly quoting the search string when applying it to each field. Per https://lucene.apache.org/core/9_8_0/queryparser/org/apache/lucene/queryparser/classic/package-summary.html#Fields this means the query is only searching for
red
in theftext
field andbooze
in the default field, which does however happen to beftext
judging by the parsed query:parsed query: +(ftext:red ftext:booze ftext:red ftext:booze ftext:red ftext:booze ftext:red ftext:booze note:red ftext:booze) +graph:urn:x-arq:DefaultGraphNode; limit: 10000
This means that only the first word in your query gets queried in the
note
field which is why the order of the terms in the query affects the results.@OyvindLGjesdal does that look like a valid analysis to you?
It also looks like we generate duplicate query clauses when multiple properties map to the same Lucene field which might be unnecessary?
@rvesse This looks like a valid analysis to me, but this is also unknown parts to me and based on reading the links you posted. But it does it does line up perfectly with the bug, and the solution looks good. I guess it would also just handle inner logic in inner parens.
Thanks for the verbose output and experimenting @filak
I can try to create a test for this during the weekend.
This is probably only a bug in the propList, and not with the use of normal properties? I guess it would have been reported and noticed and caught by tests if this was also in rdfs:label red booze
.
On a second note, we should probably update the examples in the docs to remove the examples of setting a custom queryParser that no longer is present in Apache Lucene text:queryParser text:AnalyzingQueryParser ;
15:29:37 WARN TextIndexLucene :: Deprecated query parser type 'AnalyzingQueryParser'. Defaulting to standard QueryParser
When searching the web, the AnalyzingQueryParser is only present in lower versions of the javadocs.
This is probably only a bug in the propList, and not with the use of normal properties? I guess it would have been reported and noticed and caught by tests if this was also in
rdfs:label red booze
.
Not sure, like yourself I'm not too familiar with these parts of the codebase, probably also worth concocting a test case to validate
I think I can confirm your analysis @rvesse
If I change default field to " text:defaultField \"comment\"
the verbose output expression falls back to using comment
(default field), for booze
and only the first word red
is paired with its text field.
+(ftext:red comment:booze ftext:red comment:booze ftext:red comment:booze ftext:red comment:booze note:red comment:booze)
@filak a workaround fo could be to put () around the text query red booze
, I seem to get the expected result from the query, using that.
"SELECT ?s",
"WHERE {",
" ?s text:query ( mt:includeNotes \"(red booze)\" ) . ",
"}"
This is the output
22:14:28 DEBUG TextIndexLucene :: query$ with LIST: [http://www.w3.org/2000/01/rdf-schema#label, http://id.example.test/vocab/#altLabel, http://id.example.test/vocab/#alt_label, http://id.example.test/mx/#alt_label, http://id.example.test/vocab/#note]; INPUT qString: (ftext:(red booze) ftext:(red booze) ftext:(red booze) ftext:(red booze) note:(red booze) ) AND graph:urn\:x\-arq\:DefaultGraphNode; with queryParserType: AnalyzingQueryParser; parseQuery with PerFieldAnalyzerWrapper({lang=org.apache.lucene.analysis.core.KeywordAnalyzer@59532566, uri=org.apache.lucene.analysis.core.KeywordAnalyzer@dca2615, graph=org.apache.lucene.analysis.core.KeywordAnalyzer@421a4ee1}, default=MultilingualAnalyzer(default=org.apache.jena.query.text.analyzer.ConfigurableAnalyzer@4f63e3c7)) YIELDS: +((ftext:red ftext:booze) (ftext:red ftext:booze) (ftext:red ftext:booze) (ftext:red ftext:booze) (note:red note:booze)) +graph:urn:x-arq:DefaultGraphNode; parsed query: +((ftext:red ftext:booze) (ftext:red ftext:booze) (ftext:red ftext:booze) (ftext:red ftext:booze) (note:red note:booze)) +graph:urn:x-arq:DefaultGraphNode; limit: 10000
22:14:28 TRACE TextIndexLucene :: simpleResults[10]: fields: [ftext, ftext, ftext, ftext, note] doc: Document<stored,indexed,tokenized,indexOptions=DOCS<uri:http://id.example.test/2> stored,indexed,tokenized,indexOptions=DOCS<graph:urn:x-arq:DefaultGraphNode> stored,indexed,tokenized<note:Red or white> stored,indexed,tokenized,omitNorms,indexOptions=DOCS<lang:en> stored,indexed,tokenized,omitNorms,indexOptions=DOCS<uid:48bc17f6921b4efff3f082a027a3e2c11037e9262ab743ed174587619543f767>>
22:14:28 TRACE TextIndexLucene :: simpleResults[3]: fields: [ftext, ftext, ftext, ftext, note] doc: Document<stored,indexed,tokenized,indexOptions=DOCS<uri:http://id.example.test/1> stored,indexed,tokenized,indexOptions=DOCS<graph:urn:x-arq:DefaultGraphNode> stored,indexed,tokenized<note:Booze is a pleasure> stored,indexed,tokenized,omitNorms,indexOptions=DOCS<lang:en> stored,indexed,tokenized,omitNorms,indexOptions=DOCS<uid:8df507c91a27f4bb554f97c7b5c6b980c48012ab2e23132a879189bbce05fc18>>
22:14:28 TRACE TextQueryPF :: resultsToQueryIterator CALLED with results: [TextHit{node=http://id.example.test/2 literal="Red or white"@en score=0.58286893 graph=urn:x-arq:DefaultGraphNode prop=http://id.example.test/vocab/#note}, TextHit{node=http://id.example.test/1 literal="Booze is a pleasure"@en score=0.51788014 graph=urn:x-arq:DefaultGraphNode prop=http://id.example.test/vocab/#note}]
However this looks like a bug, the intent seems clear that the propList is applied to the entire quoted expression and not the just the first word, also from the examples in the docs.
Same bug happens when using a single property:
SELECT ?s",
"WHERE {",
" ?s text:query ( mt:note \"booze red\" ) . ",
"}"
INPUT qString: (note:booze red ) AND graph:urn\:x\-arq\:DefaultGraphNode; with queryParserType: AnalyzingQueryParser; parseQuery with PerFieldAnalyzerWrapper({lang=org.apache.lucene.analysis.core.KeywordAnalyzer@59532566, uri=org.apache.lucene.analysis.core.KeywordAnalyzer@dca2615, graph=org.apache.lucene.analysis.core.KeywordAnalyzer@421a4ee1}, default=MultilingualAnalyzer(default=org.apache.jena.query.text.analyzer.ConfigurableAnalyzer@4f63e3c7)) YIELDS: +(note:booze comment:red) +graph:urn:x-arq:DefaultGraphNode; parsed query: +(note:booze comment:red) +graph:urn:x-arq:DefaultGraphNode; limit: 10000
and just one hit:
[TextHit{node=http://id.example.test/1 literal="Booze is a pleasure"@en score=0.51788014 graph=urn:x-arq:DefaultGraphNode prop=http://id.example.test/vocab/#note}]
The bug remains also if the assembler config is minimized and just use all text default configs.
Any updates on this @OyvindLGjesdal ?
I think the Pull request is completed from my side, and is in the process for being reviewed.
Version
4.9.0
Question
This query works - searching all the fields:
However this query - which should search only in rdfs:label and mt:altLabel fields returns 0 hits :
This query returns also 0 hits :
mytest.ttl excerption: