couchbase / couchbase-lite-ios

Lightweight, embedded, syncable NoSQL database engine for iOS and MacOS apps.
Apache License 2.0
1.62k stars 297 forks source link

Full-text search feature #131

Closed snej closed 10 years ago

snej commented 10 years ago

It would be useful to support full-text search in queries; it's been asked for many times.

SQLite has a FTS extension, and it's built-in on iOS (at least in iOS 7 where I just tested for it), so this should be fairly easy to implement.

The indexing API could be to pass a special dictionary key to the emit function, similarly to the way we're doing geo indexing:

emit(@{"type":"Text", "text": doc[@"bigtext"]}, nil);
tleyden commented 10 years ago

Android analog bug: https://github.com/couchbase/couchbase-lite-android/issues/105

PaulCapestany commented 10 years ago

@snej dunno if you've come across Simon Wolf's blog post RE: full text search (and his associated Bitbucket repos) yet, but I'd bookmarked this a couple months ago to eventually dig into since it looked like the best/latest info on implementing FTS w/ sqlite that I could find at the time → http://swwritings.com/post/2013-04-30-searching-for-speedy-searching

snej commented 10 years ago

Thanks for the link, Paul. Among other things it confirms that FTS is available:

Fortunately recent builds of iOS and OS X include bundled SQLite libraries with everything you need enabled. I have an iPhone 3GS running iOS 5.1 and I can confirm that this version of iOS contains SQLite with FTS enabled.

I've pretty much gotten it implemented in CBL. Emitting the text to index is done just as I showed above, and you query by setting a new property fullTextQuery on CBLQuery. The query string has the syntax of the FTS query language as described on that SQLite web page.

I'll push it tomorrow … right now I need to go to bed!

PaulCapestany commented 10 years ago

Awesome :) :+1:

snej commented 10 years ago

I've checked experimental FTS support into a branch called fulltext, and added some documentation to the wiki.

You can give it a try if you like, but be aware it's likely to undergo some change. (The biggest remaining issue is that it works poorly with non-ASCII or non-English text.) Also note that it does alter the SQLite database schema, so it's possible that future changes to it might require deleting and recreating a database. Good thing databases are so easy to back up and restore, right? :)

snej commented 10 years ago

FYI, it now works better with non-English text, although it's still not able to handle Asian (CJK, Thai) writing due to the lack of spaces between words. But for European languages and Russian the new tokenizer should work well.

jchris commented 10 years ago

I'm still thinking it makes sense to bundle the ft or geo data in the emit() value, bc it can enable queries like: find me the top scoring players within this bouncing box, or find me apartments in my price range that mention "hot tub". Maybe these queries lean to heavily on the underlying SQL engine, but that is about the only con I can think of.

On Sunday, September 22, 2013, Jens Alfke wrote:

FYI, it now works better with non-English text, although it's still not able to handle Asian (CJK, Thai) writing due to the lack of spaces between words. But for European languages and Russian the new tokenizer should work well.

— Reply to this email directly or view it on GitHubhttps://github.com/couchbase/couchbase-lite-ios/issues/131#issuecomment-24899500 .

— Chris Anderson @jchris http://www.couchbase.com

snej commented 10 years ago

That's definitely a useful feature. The thing is, the text/shape is conceptually a key not a value, so it doesn't make sense to put it in the value parameter. I'm thinking of allowing a compound key, where you could combine a regular key with a special one... something like CBLCompoundKey(a,b).

snej commented 10 years ago

Closing this as it's available on the fulltext branch, although it's not merged into master yet.