Closed Witiko closed 2 years ago
Good catch here, after looking over the math in the paper I'd say you're completely right. I'll merge this in and see if I can setup a release soon with these fixes in place.
Thanks a bunch for taking the time to submit a PR!
In the paper, the tftd factor in the BM25L scoring formula is contained within ctd:
However, in the rank-bm25 library, the tftd factor is used in addition to ctd, leading to tftd · (k1 + 1) · (ctd + δ) in the numerator. This PR fixes that.