couchbase / couchbase-lite-ios

Lightweight, embedded, syncable NoSQL database engine for iOS and MacOS apps.
Apache License 2.0
1.62k stars 298 forks source link

Please add the ability to turn off the stemmers for FTS searches #1296

Closed brendand closed 3 years ago

brendand commented 8 years ago

If I have multiple documents, some with apple and others with application, if I search for apple, I'm also getting back the documents that have application in it.

I understand that it's probably the stemmer which is truncating the le bit from the end of apple before it does its search, so that's why application is also being returned.

This is problematic for me because I've just added a Find and Replace function to my app. The customer can search for a set of documents that match their search term and then have my app replace a value in the found set of documents with another value. If the found set of documents contains things they had not intended, then the results could be very bad indeed.

So I really need the ability to turn off all stemmers so that customers can more easily do exact match searches. But I still need the ability to do prefixed searches too though. So if I search for app I should get apple and application documents back.


pual commented 7 years ago

I have similar problems using the full text search for some words. As long as it's not supported to disable this feature using the public API, how can I remove the stemmer from the source? I removed the stemmer from the SQL-FTS statement in CBL_SQLiteViewStorage changing

    NSString* sql = $sprintf(@"\
        CREATE VIRTUAL TABLE IF NOT EXISTS fulltext \
            USING fts4(content, tokenize=unicodesn %@);\
        CREATE INDEX IF NOT EXISTS  'maps_#_by_fulltext' ON 'maps_#'(fulltext_id); \
        CREATE TRIGGER IF NOT EXISTS 'del_maps_#_fulltext' \
            DELETE ON 'maps_#' WHEN old.fulltext_id not null BEGIN \
                DELETE FROM fulltext WHERE rowid=old.fulltext_id| END", stemmer);

to

    NSString* sql = @"\
        CREATE VIRTUAL TABLE IF NOT EXISTS fulltext \
            USING fts4(content);\
        CREATE INDEX IF NOT EXISTS  'maps_#_by_fulltext' ON 'maps_#'(fulltext_id); \
        CREATE TRIGGER IF NOT EXISTS 'del_maps_#_fulltext' \
            DELETE ON 'maps_#' WHEN old.fulltext_id not null BEGIN \
                DELETE FROM fulltext WHERE rowid=old.fulltext_id| END";

and removed the tokenizer initialization from CBL_SQLiteStorage, function

register_unicodesn_tokenizer(dbHandle);

When running my unittests again for the special words I get some couchbase warnings like

SQLite error 1 pruning generations < 33 of doc 12374 {at -[CBL_SQLiteStorage pruneDocument:numericID:generationsBelow:]:2249}
SQLite error 1 pruning generations < 69 of doc 2 {at -[CBL_SQLiteStorage pruneDocument:numericID:generationsBelow:]:2249}
SQLite error 1 pruning generations < 70 of doc 2 {at -[CBL_SQLiteStorage pruneDocument:numericID:generationsBelow:]:2249}
CouchbaseLite: Failed to rebuild views (TestView_FTS-de): 590 {at -[CBL_SQLiteViewStorage updateIndexes:]:541}
Failed to update view index: 590 {at -[CBLDatabase(Views) queryViewNamed:options:ifChangedSince:status:]:490}

and the queries are returning nil.

snej commented 7 years ago

@pual You just need to remove the lines

    if (stemmerName)
        stemmer = $sprintf(@"\"stemmer=%@\"", stemmerName);

Leave the tokenizer. The tokenizer is responsible for breaking text into words, which is necessary for full-text search.

snej commented 7 years ago

@brendand Append a * to a word to do a prefix search.

brendand commented 7 years ago

@snej Yes, I already am doing that for prefixed searches. Wish it worked for ForestDB too :)

The stemmers though really mess things up. Are they there because most people would always require stemmed searches?

jayahariv commented 3 years ago

Closing 1.x issue!