Use MMR as the output score and sorting key

hberinsky commented 1 month ago

When consider using Maximal Marginal Relevance (MMR) to diversify the results it turns out the keyword or keyphrases order are based on the cosine similarity scores instead of the MMR. Although the same keywords are returned, I would expect to see more diversity in the top results.

from keybert import KeyBERT

doc = """
         Supervised learning is the machine learning task of learning a function that
         maps an input to an output based on example input-output pairs. It infers a
         function from labeled training data consisting of a set of training examples.
         In supervised learning, each example is a pair consisting of an input object
         (typically a vector) and a desired output value (also called the supervisory signal).
         A supervised learning algorithm analyzes the training data and produces an inferred function,
         which can be used for mapping new examples. An optimal scenario will allow for the
         algorithm to correctly determine the class labels for unseen instances. This requires
         the learning algorithm to generalize from the training data to unseen situations in a
         'reasonable' way (see inductive bias).
      """
kw_model = KeyBERT()
keywords = kw_model.extract_keywords(doc)

Current output

>>> kw_model.extract_keywords(doc, keyphrase_ngram_range=(3, 3), stop_words='english', use_mmr=True, diversity=0.2)
[('supervised learning algorithm', 0.6992),
 ('supervised learning example', 0.6807),
 ('supervised learning machine', 0.6706),
 ('function labeled training', 0.663),
 ('supervisory signal supervised', 0.5802)]

Expected output (here the scores are the MRR)

>>> kw_model.extract_keywords(doc, keyphrase_ngram_range=(3, 3), stop_words='english', use_mmr=True, diversity=0.2)
[('supervised learning algorithm', 0.5594),
 ('function labeled training', 0.4173),
 ('supervised learning example', 0.3902),
 ('supervisory signal supervised', 0.3673),
 ('supervised learning machine', 0.362)]

MaartenGr commented 1 month ago

Thanks for the PR. Could you perhaps first go into the details why you think that the implementation as it already stands is not correct? The examples that you show demonstrate what you would expect but not why you would expect it. Moreover, the diversity value you use is rather low and in the example you reference a diversity of 0.7 is also used to showcase more diversity.

hberinsky commented 1 month ago

@MaartenGr Thanks for your feedback.

Sure, I can explain in more detail. I would expect the MRR score returned when I use it to evaluate the effect of the threshold on the MRR to optimize a target metric for an information retrieval task. This is an alternative to establish a threshold based on the number of results (top_n).

I don't think that the implementation as it already stands is not correct, I would rather think probably depends on the specific use case, and the MRR metric is specially useful when using the MMR where the algorithm decide which keyphrases to include in the set of results based on its value (considering we extract a limited number of keyphrases). I agree using a diversity of 0.7 I can get more diversity, I just wanted to show an example of the relative difference considering the relevance vs the relevance and diversity metrics.

MaartenGr commented 1 month ago

Sure, I can explain in more detail. I would expect the MRR score returned when I use it to evaluate the effect of the threshold on the MRR to optimize a target metric for an information retrieval task. This is an alternative to establish a threshold based on the number of results (top_n).

Are you referring here to the Mean Reciprocal Rank? If so, why would think this improves upon the current implementation? You mention that it is an alternative but I do not see how this would improve upon what is already implemented.

I don't think that the implementation as it already stands is not correct, I would rather think probably depends on the specific use case, and the MRR metric is specially useful when using the MMR where the algorithm decide which keyphrases to include in the set of results based on its value (considering we extract a limited number of keyphrases). I agree using a diversity of 0.7 I can get more diversity, I just wanted to show an example of the relative difference considering the relevance vs the relevance and diversity metrics.

Considering there are use cases for both, with and without MRR, I am not sure we should change what is already implemented. Instead, I might opt for adding something like this for the user to choose instead.

hberinsky commented 1 month ago

Sure, I can explain in more detail. I would expect the MRR score returned when I use it to evaluate the effect of the threshold on the MRR to optimize a target metric for an information retrieval task. This is an alternative to establish a threshold based on the number of results (top_n).

Are you referring here to the Mean Reciprocal Rank? If so, why would think this improves upon the current implementation? You mention that it is an alternative but I do not see how this would improve upon what is already implemented.

I'm sorry, it was a typo there, not MRR but MMR (Maximal Marginal Relevance).

I don't think that the implementation as it already stands is not correct, I would rather think probably depends on the specific use case, and the MRR metric is specially useful when using the MMR where the algorithm decide which keyphrases to include in the set of results based on its value (considering we extract a limited number of keyphrases). I agree using a diversity of 0.7 I can get more diversity, I just wanted to show an example of the relative difference considering the relevance vs the relevance and diversity metrics.

Considering there are use cases for both, with and without MRR, I am not sure we should change what is already implemented. Instead, I might opt for adding something like this for the user to choose instead.

MaartenGr / KeyBERT

Use MMR as the output score and sorting key #227