couchbase / couchbase-lite-ios

Lightweight, embedded, syncable NoSQL database engine for iOS and MacOS apps.
Apache License 2.0
1.62k stars 297 forks source link

Unknown Tokenizer: unicodesn #1281

Closed brendand closed 8 years ago

brendand commented 8 years ago

I'm getting an unknown tokenizer error whenever I try to query my FTS index. This used to work. I'm not really sure what happened to cause this. I did update to the latest SQLCipher branch from couchbase-lite-sqlcipher, but I tried reverting to an older build but it has the same problem. I built my own libsqlcipher.a library, but had the same troubles, so I just used the pre-built libsqlcipher.a that comes with it. I have the same problem. I re-compiled CouchbaseLite using the CI iOS scheme so I could get the CouchbaseLite and CouchbaseLiteListener frameworks all in one build.

Perhaps there's an issue there in the building of the archives in the latest dev branch.

WARNING: Error initializing fts4 schema: SQLite[1, "unknown tokenizer: unicodesn"] {at -[CBL_SQLiteViewStorage createFullTextSchema]:175}

Error running its query: Error Domain=CBLHTTP Code=501 "unimplemented" UserInfo={NSLocalizedFailureReason=unimplemented, NSLocalizedDescription=unimplemented}

So there's also another detail in my scenario that's important to know. I used to import SQLCipher using cocoa pods, but have since replaced it with the CBL-SQLCipher version. But in my Podfile I have the following code that executes after the pods update:

post_install do |installer_representation|
  installer_representation.pods_project.targets.each do |target|
    if target.name == 'FMDB'
      target.build_configurations.each do |config|
        config.build_settings['OTHER_CFLAGS'] ||= ['$(inherited)']
        config.build_settings['OTHER_CFLAGS'] << '-DSQLITE_HAS_CODEC -DSQLITE_ENABLE_RTREE=1 -DSQLITE_ENABLE_FTS3=1 -DSQLITE_ENABLE_FTS3_PARENTHESIS -DSQLITE_ENABLE_FTS4_UNICODE61'
      end
    end
  end
end

I used to have those settings attached to the SQLCipher cocoa pod, but now I attach them to the FMDB cocoa pod so they still get used when building my project.

And now for the strange bit. If I remove the above code from my Podfile, FTS searching then works, but then encryption does not. I'm lead to believe that it ends up using the built-in SQLite engine instead of the one in SQLCipher due to the lack of the -DSQLITE_HAS_CODEC argument.

I'm sure it's some misconfiguration I've managed to get into my Xcode project, but it seems right now I can have encryption or FTS searching, but not both.


brendand commented 8 years ago

On a related note, everything works fine in OS X. Encryption/Decryption works. FTS works.

brendand commented 8 years ago

Just to make sure that SQLCipher was compiled properly, I printed out the compiler options using the `pragma compiler_options;' function.

This is the result:

cipher provider: commoncrypto
ENABLE_COLUMN_METADATA
ENABLE_FTS3
ENABLE_FTS3_PARENTHESIS
ENABLE_FTS4
ENABLE_FTS5
ENABLE_JSON1
ENABLE_LOAD_EXTENSION
ENABLE_MEMORY_MANAGEMENT
ENABLE_RTREE
ENABLE_STAT4
ENABLE_UNLOCK_NOTIFY
HAS_CODEC
SOUNDEX
SYSTEM_MALLOC
TEMP_STORE=2
THREADSAFE=1

So maybe that's the problem. I don't see the ENABLE_FTS4_UNICODE61 option. Although that's the unicode61 tokenizer and not the snowball tokenizer, so maybe that's not the problem.

A bit stumped at the moment, but I'll recompile it myself again and see what I get.

brendand commented 8 years ago

Compiling myself didn't make a difference and showed the same results. ENABLE_FTS4_UNICODE61 is definitely there in the compile options in the Xcode project, so that's not the issue.

brendand commented 8 years ago

So just to take my own builds out of the equation, I tried out your latest verified build: couchbase-lite-ios-community_1.3.0-17 but I still have the issue.

Here's the output when I tried to do a search:

2016-06-09 01:56:27.935 Tap Forms[9250:4865935] DB Error: 1 "unknown tokenizer: unicodesn"
2016-06-09 01:56:27.936 Tap Forms[9250:4865935] DB Query: SELECT docs.docid, 'maps_3'.sequence, 'maps_3'.fulltext_id, 'maps_3'.value, offsets(fulltext) FROM 'maps_3', fulltext, revs, docs WHERE fulltext.content MATCH ? AND 'maps_3'.fulltext_id = fulltext.rowid AND revs.sequence = 'maps_3'.sequence AND docs.doc_id = revs.doc_id ORDER BY - ftsrank(matchinfo(fulltext))  LIMIT ? OFFSET ?
2016-06-09 01:56:27.936 Tap Forms[9250:4865935] Error running its query: Error Domain=CBLHTTP Code=400 "bad_request" UserInfo={NSLocalizedFailureReason=bad_request, NSLocalizedDescription=bad_request}
snej commented 8 years ago

The unicodesn tokenizer is a separate callback function that gets registered with SQLite at startup. So your error isn't related to how SQLite is built, but with what's happening at initialization time.

An earlier case where this error appeared was #983, but that was fixed a while ago and doesn't seem like the issue here.

Try setting a symbolic breakpoint at register_unicodesn_tokenizer() and see if it gets called.

brendand commented 8 years ago

So it is getting called, but it's returning here:

screen shot 2016-06-09 at 10 59 38 am

Which seems like it failed to work. The value of rc is currently 1 at this line.

snej commented 8 years ago

1 is SQLITE_ERROR. Not very informative...

brendand commented 8 years ago

No, certainly not. Hmmm... I've tried all kinds of different things. Rebuilding SQLite, rebuilding Couchbase. The Mac version returns with an rc value of 100, so it correctly executes the final return sqlite3_finalize(pStmt); statement. I don't know what's up with my iOS project. I know this used to work.

snej commented 8 years ago

I tried out your latest verified build: couchbase-lite-ios-community_1.3.0-17 but I still have the issue.

But are you using the built-in sqlite3.dylib, or a SQLite/SQLCipher you built yourself?

brendand commented 8 years ago

I'm using SQLCipher 3.4.0. I've used both a prebuilt version that you provide in the nightly builds of CBL 1.3 and the built version from the couchbase-lite-sqlcipher repos and I've built it myself from the build-ios.sh script from that repo.

brendand commented 8 years ago

Ok, well crisis averted :)

I must have had some crap left over in my projects somewhere. I deleted the CBL frameworks from my app's project folder (I keep a copy of them within my app's Third Party folder), then re-added them, then I deleted derived data for the project. I know I cleaned a bunch of times trying to get this to work, but it must have been the deleting of the derived data which fixed the problem once and for all I think.

Low and behold, both encryption and searching now works. No more unicodesn errors.

Phew! This was driving me crazy for the past couple of days.

Sorry for posting this as an issue. I'm closing it now.

snej commented 8 years ago

Yay! I love it when problems fix themselves.