dorianbrown / rank_bm25

A Collection of BM25 Algorithms in Python
Apache License 2.0
1.02k stars 86 forks source link

Add BM25F function #11

Open bohnpessatti opened 4 years ago

bohnpessatti commented 4 years ago

Congratulations for the initiative, your project it's being quite useful in my work.

I would like to suggest adding a function for the BM25F method, which takes different document fields relevance into account before using BM25 saturating function.

This avoids dangerous over-estimation of terms importance when combining linearly BM25 scores from different fields [1]. Therefore, it could make your project more robust for structured text ranking.

References: [1] https://trec.nist.gov/pubs/trec13/papers/microsoft-cambridge.web.hard.pdf [2] https://www.researchgate.net/publication/221613382_Simple_BM25_extension_to_multiple_weighted_fields

Thank you in advance.

dorianbrown commented 4 years ago

Thanks for the kind words, and I'm glad this package has been of use to you!

It sounds like it would be a useful addition to the package. I don't have time to add these changes at the moment, as I no longer work with this application for my job. If you'd like to create a pull request with the changes I'd be happy to review it and add it, but otherwise I'll try and add it when I've got some time for it.