dorianbrown / rank_bm25

A Collection of BM25 Algorithms in Python
Apache License 2.0
983 stars 83 forks source link

Updated _initialize(self,corpus) function #8

Closed Sarthakjain1206 closed 4 years ago

Sarthakjain1206 commented 4 years ago

Hey, I have improvised the initialize function. The previous implementation was taking a lot of time due to unnecessary searching for 'word' if present in 'nd'. In time complexity terms, it was taking O(n) time for every outer iteration. Whereas we can do that part in O(1) time.

Test it out by taking a huge corpus, In that case, you will see a lot of time difference. And, this time difference matters a lot for searching.