Closed brendand closed 3 years ago
I have similar problems using the full text search for some words.
As long as it's not supported to disable this feature using the public API, how can I remove the stemmer from the source?
I removed the stemmer from the SQL-FTS statement in CBL_SQLiteViewStorage
changing
NSString* sql = $sprintf(@"\
CREATE VIRTUAL TABLE IF NOT EXISTS fulltext \
USING fts4(content, tokenize=unicodesn %@);\
CREATE INDEX IF NOT EXISTS 'maps_#_by_fulltext' ON 'maps_#'(fulltext_id); \
CREATE TRIGGER IF NOT EXISTS 'del_maps_#_fulltext' \
DELETE ON 'maps_#' WHEN old.fulltext_id not null BEGIN \
DELETE FROM fulltext WHERE rowid=old.fulltext_id| END", stemmer);
to
NSString* sql = @"\
CREATE VIRTUAL TABLE IF NOT EXISTS fulltext \
USING fts4(content);\
CREATE INDEX IF NOT EXISTS 'maps_#_by_fulltext' ON 'maps_#'(fulltext_id); \
CREATE TRIGGER IF NOT EXISTS 'del_maps_#_fulltext' \
DELETE ON 'maps_#' WHEN old.fulltext_id not null BEGIN \
DELETE FROM fulltext WHERE rowid=old.fulltext_id| END";
and removed the tokenizer initialization from CBL_SQLiteStorage
, function
register_unicodesn_tokenizer(dbHandle);
When running my unittests again for the special words I get some couchbase warnings like
SQLite error 1 pruning generations < 33 of doc 12374 {at -[CBL_SQLiteStorage pruneDocument:numericID:generationsBelow:]:2249}
SQLite error 1 pruning generations < 69 of doc 2 {at -[CBL_SQLiteStorage pruneDocument:numericID:generationsBelow:]:2249}
SQLite error 1 pruning generations < 70 of doc 2 {at -[CBL_SQLiteStorage pruneDocument:numericID:generationsBelow:]:2249}
CouchbaseLite: Failed to rebuild views (TestView_FTS-de): 590 {at -[CBL_SQLiteViewStorage updateIndexes:]:541}
Failed to update view index: 590 {at -[CBLDatabase(Views) queryViewNamed:options:ifChangedSince:status:]:490}
and the queries are returning nil
.
@pual You just need to remove the lines
if (stemmerName)
stemmer = $sprintf(@"\"stemmer=%@\"", stemmerName);
Leave the tokenizer. The tokenizer is responsible for breaking text into words, which is necessary for full-text search.
@brendand Append a *
to a word to do a prefix search.
@snej Yes, I already am doing that for prefixed searches. Wish it worked for ForestDB too :)
The stemmers though really mess things up. Are they there because most people would always require stemmed searches?
Closing 1.x issue!
If I have multiple documents, some with
apple
and others withapplication
, if I search forapple
, I'm also getting back the documents that haveapplication
in it.I understand that it's probably the stemmer which is truncating the
le
bit from the end ofapple
before it does its search, so that's whyapplication
is also being returned.This is problematic for me because I've just added a Find and Replace function to my app. The customer can search for a set of documents that match their search term and then have my app replace a value in the found set of documents with another value. If the found set of documents contains things they had not intended, then the results could be very bad indeed.
So I really need the ability to turn off all stemmers so that customers can more easily do exact match searches. But I still need the ability to do prefixed searches too though. So if I search for
app
I should getapple
andapplication
documents back.