Closed TiZott closed 5 years ago
There is a default stopword list. Common words with little utility for search are automatically removed based on whether or not the appear in the stopword list. This is a common practice for search engines.
On Mon, May 6, 2019 at 09:37 TiZott notifications@github.com wrote:
On a fresh installation, default namespace:
PREFIX dc: http://purl.org/dc/elements/1.1/ INSERT { http://example/egbook dc:title "This is an example title" . http://example/test dc:title "be" . } WHERE {}
Then rebuild full text index and query:
prefix bds: http://www.bigdata.com/rdf/search# select ?s ?p ?o where { ?o bds:search "be" . ?s ?p ?o . }
No result.
prefix bds: http://www.bigdata.com/rdf/search# select ?s ?p ?o where { ?o bds:search "be This be is an be example title be" . ?o bds:matchAllTerms "true" . ?s ?p ?o . }
Result: http://example/egbook | dc:title | This is an example title
prefix bds: http://www.bigdata.com/rdf/search# select ?s ?p ?o where { ?o bds:search "be This be is an b example title be" . ?o bds:matchAllTerms "true" . ?s ?p ?o . }
No result. Only works with "be". Not "b", nor "bee".
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/blazegraph/database/issues/131, or mute the thread https://github.com/notifications/unsubscribe-auth/AATW7YGEWZ3GNLVGFZFLGATPUBNETANCNFSM4HLBR72A .
How can I override oder disable the stopword list?
@TiZott this is what I did: https://github.com/phenoscape/phenoscape-owl-tools/blob/10348c588e459b918e0b68dd64fbc5251d7393c4/blazegraph.properties#L42-L46
# Disable stopwords in text search; bad for term completion
com.bigdata.search.FullTextIndex.analyzerFactoryClass=com.bigdata.search.ConfigurableAnalyzerFactory
com.bigdata.search.ConfigurableAnalyzerFactory.analyzer.eng.analyzerClass=org.apache.lucene.analysis.standard.StandardAnalyzer
com.bigdata.search.ConfigurableAnalyzerFactory.analyzer.eng.stopwords=none
com.bigdata.search.ConfigurableAnalyzerFactory.analyzer._.like=eng
@balhoff Thank you very much! Exactly what I was looking for!
On a fresh installation, default namespace:
Then rebuild full text index and query:
No result.
Result:
<http://example/egbook> | dc:title | This is an example title
No result. Only works with "be". Not "b", nor "bee".