currently BENI, ZURCH and the spell checker rely on the inverse index lookup procedure, which is suboptimal.
Currently the procedure traverses all feature linked documents for all features and updates the scores for all documents. Then it performs a sort
Two optimizations can be performed:
1) feature count cut off (or some other rough score cut off) - meaning once a certain score is achieved some documents should be disqualified automatically.
2) priority queue can be used to keep speed up the process
both optimizations are likely to result in a manifold speed increase on average for beni search, which is important for very large sets. For example we've observed speeds of 200-300 milliseconds on for ISBN database from amazon.
currently BENI, ZURCH and the spell checker rely on the inverse index lookup procedure, which is suboptimal.
Currently the procedure traverses all feature linked documents for all features and updates the scores for all documents. Then it performs a sort
Two optimizations can be performed: 1) feature count cut off (or some other rough score cut off) - meaning once a certain score is achieved some documents should be disqualified automatically. 2) priority queue can be used to keep speed up the process
both optimizations are likely to result in a manifold speed increase on average for beni search, which is important for very large sets. For example we've observed speeds of 200-300 milliseconds on for ISBN database from amazon.