dorianbrown / rank_bm25

A Collection of BM25 Algorithms in Python
Apache License 2.0
983 stars 83 forks source link

Fixed division by zero error #7

Closed mariusjohan closed 2 years ago

mariusjohan commented 4 years ago

I ran the module, and for some reason the length of idf was 0 which gave me the error, however we're setting the avg idf to 0 if the length of idf is 0
We may also have to do this with the other bm25 modules

dorianbrown commented 4 years ago

As I see it, this only happens when it gets an empty corpus as input since len(self.idf) should be the same length as the number of unique words in the corpus.

Could you elaborate on what kind of corpus caused this error?

mariusjohan commented 4 years ago

I used the BM25 algorithm on some webscraped data, which then would be passed into an ai, so I don't have the corpus anymore.
But to make the software more secure I think we should merge the commit, since it doesn't really affect the time and it would just be easier to use.