TiddlySpace / tiddlyspace

A discoursive social model for tiddlers
http://tiddlyspace.com
Other
106 stars 38 forks source link

whoosher index fails under concurrency #144

Closed cdent closed 14 years ago

cdent commented 14 years ago

There are bugs in Whoosh and the implementation of whoosher which cause index reads and writes to sometimes fail. This usually happens when multiple processes or threads are attempting to access the index around the same time.

Though there are safeguards in place in both Whoosh and tiddlyweb's use of it (things like lock files and lock file backoffs), these are insufficient because of bugs in Whoosh itself. The primary bug is represented by this ticket:

http://bitbucket.org/mchaput/whoosh/issue/16/bug-in-filestore

It shows up in the tiddlyspace.com tiddlyweb.log as

IOError: [Errno 2] No such file or directory: '/home/tiddlyweb/tiddlywebs/tiddlyspace.com/indexdir/_MAIN_3659.tiz'

With further debug info as:

DEBUG determined bag filter refused

DEBUG whoosher: unable to get writer (locked) for [...]

The first happens when trying to figure out which bag a tiddler comes from, and the second when indexing new content. The first is not a big deal as it will fail over to using the slower method. The second is a problem because the tiddler involved will not be indexed while still making it seem that the index is correct.

There are three avenues to fixing this:

Getting the Whoosh bug fixed seems unlikely as the same bug has been re-occurring in the Whoosh codebase for a couple of years now with various fixes and it keeps rearing its head. The latest bug report (above) hasn't yet had any attention from the author. cdent reckons he could figure it out but it would take something like a week to two weeks of uninterrupted time to model an isolated case and then fix the bug. Thus the efforts at switching to mysql.

The mysql tool, though, has its own ball of problems, including how to parse search queries, and there are known issues with mysql library mismatches between threaded and non-threaded environments. One thing to avoid is using PHP anywhere on the apache server in question.

cdent commented 14 years ago

tiddlyspace is now set up to use mysql

The implementation parses whoosher style search queries into SQL.

Indexing appears to be working correctly.

For qualified searches (ones where keys are mentioned) results come fast. For more general searches, the results are not fast, but we'll cross that bridge later.