blazegraph / database

Blazegraph High Performance Graph Database
GNU General Public License v2.0
873 stars 170 forks source link

Full text search ignores the literal "be" #131

Closed TiZott closed 5 years ago

TiZott commented 5 years ago

On a fresh installation, default namespace:

PREFIX dc: <http://purl.org/dc/elements/1.1/>
INSERT {
  <http://example/egbook> dc:title  "This is an example title" .
  <http://example/test> dc:title  "be" .
} WHERE {}

Then rebuild full text index and query:

prefix bds: <http://www.bigdata.com/rdf/search#>
select ?s ?p ?o
where {
?o bds:search "be" .
?s ?p ?o .
}

No result.

prefix bds: <http://www.bigdata.com/rdf/search#>
select ?s ?p ?o
where {
?o bds:search "be This be is an be example title be" .
?o bds:matchAllTerms "true" .
?s ?p ?o .
}

Result: <http://example/egbook> | dc:title | This is an example title

prefix bds: <http://www.bigdata.com/rdf/search#>
select ?s ?p ?o
where {
?o bds:search "be This be is an b example title be" .
?o bds:matchAllTerms "true" .
?s ?p ?o .
}

No result. Only works with "be". Not "b", nor "bee".

thompsonbry commented 5 years ago

There is a default stopword list. Common words with little utility for search are automatically removed based on whether or not the appear in the stopword list. This is a common practice for search engines.

On Mon, May 6, 2019 at 09:37 TiZott notifications@github.com wrote:

On a fresh installation, default namespace:

PREFIX dc: http://purl.org/dc/elements/1.1/ INSERT { http://example/egbook dc:title "This is an example title" . http://example/test dc:title "be" . } WHERE {}

Then rebuild full text index and query:

prefix bds: http://www.bigdata.com/rdf/search# select ?s ?p ?o where { ?o bds:search "be" . ?s ?p ?o . }

No result.

prefix bds: http://www.bigdata.com/rdf/search# select ?s ?p ?o where { ?o bds:search "be This be is an be example title be" . ?o bds:matchAllTerms "true" . ?s ?p ?o . }

Result: http://example/egbook | dc:title | This is an example title

prefix bds: http://www.bigdata.com/rdf/search# select ?s ?p ?o where { ?o bds:search "be This be is an b example title be" . ?o bds:matchAllTerms "true" . ?s ?p ?o . }

No result. Only works with "be". Not "b", nor "bee".

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/blazegraph/database/issues/131, or mute the thread https://github.com/notifications/unsubscribe-auth/AATW7YGEWZ3GNLVGFZFLGATPUBNETANCNFSM4HLBR72A .

TiZott commented 5 years ago

How can I override oder disable the stopword list?

balhoff commented 5 years ago

@TiZott this is what I did: https://github.com/phenoscape/phenoscape-owl-tools/blob/10348c588e459b918e0b68dd64fbc5251d7393c4/blazegraph.properties#L42-L46

# Disable stopwords in text search; bad for term completion
com.bigdata.search.FullTextIndex.analyzerFactoryClass=com.bigdata.search.ConfigurableAnalyzerFactory
com.bigdata.search.ConfigurableAnalyzerFactory.analyzer.eng.analyzerClass=org.apache.lucene.analysis.standard.StandardAnalyzer
com.bigdata.search.ConfigurableAnalyzerFactory.analyzer.eng.stopwords=none
com.bigdata.search.ConfigurableAnalyzerFactory.analyzer._.like=eng
TiZott commented 5 years ago

@balhoff Thank you very much! Exactly what I was looking for!