Closed lambdaofgod closed 3 years ago
First off, thanks for your interest! I made this for project I was doing last year, but don't use it that much anymore.
Regarding the argument, it's used to retrieve the original document as the return value. The original documents themselves aren't saved in the BM25()
object, only the tokenized corpus. It would be possible to save both, but would use more memory.
Out of curiosity, does the current implementation cause problems for you? I'm open to changing it
What is this argument for? It seems it can only introduce a bug since it is checked that there are as many documents as in corpus