Closed MichaelAquilina closed 10 years ago
This should now be possible with the availability of the TfidfValues table. Might be good to precompute the totals much like how Lengths are calculated for pages.
This has been implemented with the current WikiTest3 database.
Certain pages like: "The Witcher 2: Assassins of Kings" and "Reflections Projections" have a large range of (singular) rare words which make them extremely likely to become a search result if one of the terms forms part of the query vector. Ideally a page should be normalised by its total tfidf (ie its complete norm) rather than just the norm of the filtered terms related to the query vector.