Bug: possible mistake in MMR calculation

Hello,

If I am reading the code correctly, there is a mistake in the implementation of maximal marginal relevance (MMR) calculation.

Referring to the original publication https://doi.org/10.1145/290941.291025, the calculation is:

and the code as currently implemented:

mmr = (
    1 - diversity
) * candidate_similarities - diversity * target_similarities.reshape(-1, 1)
mmr_idx = candidates_idx[np.argmax(mmr)]

assuming:

diversity is equal to 1-λ
_Sim_1(Di,Q) corresponds to candidate similarities
_max Sim_2(D_i,Dj) corresponds to target_similarities

and I am assuming the last point because of the code:

target_similarities = np.max(
        word_similarity[candidates_idx][:, keywords_idx], axis=1
    )

the code should be:

mmr = (1 - diversity) *
(candidate_similarities - diversity * target_similarities.reshape(-1, 1))
mmr_idx = candidates_idx[np.argmax(mmr)]

So it appears to me that diversity is not distributed to both similarity terms as in the original equation; there needs to be parens around the difference between the similarity terms

I would note that I have seen a similar lack of parentheses, which distribute the diversity term (λ), in other works, for example http://www.cs.bilkent.edu.tr/~canf/CS533/hwSpring14/eightMinPresentations/handoutMMR.pdf

MaartenGr / KeyBERT

Bug: possible mistake in MMR calculation #192