speed up nvPY real-time searching on 2000+ notes

cpbotha / nvpy

Simplenote syncing note-taking application, inspired by Notational Velocity and ResophNotes, but uglier and cross-platformerer.

Other

851 stars 112 forks source link

speed up nvPY real-time searching on 2000+ notes #25

Open cpbotha opened 12 years ago

cpbotha commented 12 years ago

A user with more than 2000 notes reports that real-time search updates in nvpy 0.9.1 are slow, sometimes causing significant lags in nvpy interaction.

I've made one small optimization w.r.t. the tag searching, but this will only effect successful tag hits.

One possibility is offering a plain text search mode, if that can significantly speed things up.

cpbotha commented 12 years ago

Based on this stackoverflow question: http://stackoverflow.com/a/4901653/532513 I made a benchmark comparing the different search possibilities. These are the results:

string.find 0.430104970932
re.search 1.78215408325
re.compile.search 0.605902910233
re.compile.search I 0.844277143478
text in string 0.207995891571

nvPY 0.9.1 uses re.compile.search I (compiled case-insensitive regular expression searching) by default. re.compile.search (compiled case-sensitive regular expression searching) is already significantly faster, with "text in string" being by far the fastest.

Conclusion: Adding case_sensitive = 1 to your .nvpy.cfg will get you almost 25% more search speed. After 0.9.2, the default will probably change to "text in string". However, this will be configurable in the .nvpy.cfg, and there will probably be checkboxes in the UI so that you can change the mode on the fly (for example if you need to do a regexp for something specific).

DivineDominion commented 12 years ago

+1 sounds good

cpbotha commented 12 years ago

I've run more benchmarks on a text file of about 1000 words:

string.find 1.46725606918
re.search 3.21919608116
re.compile.search 2.47109103203
re.compile.search (case insensitive) 20.5269780159
text in string 1.30602288246
text in string (case insensitive) 10.9637160301

The two case-insensitive searches are an order of magnitude slower than their case sensitive counterparts. For text in string (case insensitive), I'm doing: "if pattern.lower() in search_text.lower(): return True".

DivineDominion commented 12 years ago

When I was looking into Fuzzy Search algorithms, which are overkill for this app user's needs, a recurring pattern was: create a normalized form (including all lowercase letters, for example) and store this normalized form in an index; perform searches on this single index file only. Having just one index file to search instead of iterating through a folder is said to be faster, too.

I think this would add some complexity for the file/folder wathing part, epsecially when one is using the plain text file fork of nvPY: you'd have to keep track of already indexed files and their indexing timestamp to check for changes on launch, and then you'd have to subscribe to file/folder changes to update the index during runtime, in case the user modifies or renames files with other programs.

cpbotha commented 12 years ago

My implementation is more or less done. Default is super fast gstyle case sensitive searching, but you can configure for case insensitive (much slower) and also for regular expression (with and without case sensitivity) search. I think I'm also going to put this in the UI so you can switch at runtime if you need one of the other modes just incidentally.

DivineDominion commented 12 years ago

It's faster and works more reliable than Resoph Notes.

It's still far away from being instantaneous (like NV for Mac), but I think with gstyle search it became way more usable! :)