Open schackartk opened 12 months ago
Thanks for sharing this! Coincidentally, I indeed used the following as the main source for calculating the diversity:
I would note that I have seen a similar lack of parentheses, which distribute the diversity term (λ), in other works, for example http://www.cs.bilkent.edu.tr/~canf/CS533/hwSpring14/eightMinPresentations/handoutMMR.pdf
Having said that, it might be worthwhile to test out the effect of changing the parentheses. I am quite curious to see how that would affect representation. Moreover, there are quite a number of other libraries that have MMR implemented, such as LangChain and vector database applications. I could check out what their prefered method of doing so is.
Hello,
If I am reading the code correctly, there is a mistake in the implementation of maximal marginal relevance (MMR) calculation.
Referring to the original publication https://doi.org/10.1145/290941.291025, the calculation is:
and the code as currently implemented:
assuming:
diversity
is equal to 1-λcandidate similarities
target_similarities
and I am assuming the last point because of the code:
the code should be:
So it appears to me that
diversity
is not distributed to both similarity terms as in the original equation; there needs to be parens around the difference between the similarity termsI would note that I have seen a similar lack of parentheses, which distribute the diversity term (λ), in other works, for example http://www.cs.bilkent.edu.tr/~canf/CS533/hwSpring14/eightMinPresentations/handoutMMR.pdf