dorianbrown / rank_bm25

A Collection of BM25 Algorithms in Python
Apache License 2.0
983 stars 83 forks source link

initialization with empty corpus results in ZeroDivisionError #36

Open mattf opened 8 months ago

mattf commented 8 months ago

rank-bm25==0.2.2

In [11]: rank_bm25.BM25(corpus=[])
---------------------------------------------------------------------------
ZeroDivisionError                         Traceback (most recent call last)
Cell In[11], line 1
----> 1 rank_bm25.BM25(corpus=[])

File .../lib/python3.11/site-packages/rank_bm25.py:27, in BM25.__init__(self, corpus, tokenizer)
     24 if tokenizer:
     25     corpus = self._tokenize_corpus(corpus)
---> 27 nd = self._initialize(corpus)
     28 self._calc_idf(nd)

File .../lib/python3.11/site-packages/rank_bm25.py:52, in BM25._initialize(self, corpus)
     48             nd[word] = 1
     50     self.corpus_size += 1
---> 52 self.avgdl = num_doc / self.corpus_size
     53 return nd

ZeroDivisionError: division by zero