barzerman / barzer

barzer engine code
MIT License
2 stars 0 forks source link

optimize inverse feature index lookup #676

Open barzerman opened 10 years ago

barzerman commented 10 years ago

currently BENI, ZURCH and the spell checker rely on the inverse index lookup procedure, which is suboptimal.

Currently the procedure traverses all feature linked documents for all features and updates the scores for all documents. Then it performs a sort

Two optimizations can be performed: 1) feature count cut off (or some other rough score cut off) - meaning once a certain score is achieved some documents should be disqualified automatically. 2) priority queue can be used to keep speed up the process

both optimizations are likely to result in a manifold speed increase on average for beni search, which is important for very large sets. For example we've observed speeds of 200-300 milliseconds on for ISBN database from amazon.