Closed raoashish10 closed 2 years ago
That was quick, thanks!
I don't have time to review it at the moment, but I should be able to look at it either this week or next week.
Sure no problem! Let me know if I have made any mistakes.
On Tue, Jul 7, 2020, 19:44 Dorian Brown notifications@github.com wrote:
That was quick, thanks!
I don't have time to review it at the moment, but I should be able to look at it either this week or next week.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/dorianbrown/rank_bm25/pull/9#issuecomment-654892757, or unsubscribe https://github.com/notifications/unsubscribe-auth/AM665W7D2AYUYIMJBKI5TF3R2MUT3ANCNFSM4OOV6YCQ .
Sorry for the slight delay, but I finally managed to take a look at it and compare it to the description in the paper.
So what the strategy is for this method is to calculate the parameter k1
instead of choosing one. This is done by finding the k1
that minimizes the equation (16). This minimization problem needs to be numerically solved (in the paper they use Newton-Raphson) for each term t
in the query q
, so we end up with a k1(t)
for using in equation (5).
So replacing the argmin
with min
really changes the algorithm. Does that help make things a little more clear?
I have made some changes to the BM25T class according to the paper mentioned in the README. Although, I am not completely sure if these changes are the right ones because of certain ambiguities in the paper. I saw that you hadn't implemented certain things for example:
One of the things that bugged me was the argmin function and I couldn't completely understand through the paper, what the output should signify and where it should be used, so instead I replaced the argmin function with just a min function assuming the argmin function was the index for the list of k' values.
Let me know if I have gone wrong anywhere.