Add a new `ROUGE` metric to Evidently

elenasamuylova commented 1 week ago

About Hacktoberfest contributions: https://github.com/evidentlyai/evidently/wiki/Hacktoberfest-2024

Description

The ROUGE (Recall-Oriented Understudy for Gisting Evaluation) metric evaluates the quality of a generated text by comparing it to a reference text (typically a summary). It measures how much of the reference text is covered by the generated summary through n-gram overlap. Several common ROUGE variants exist:

ROUGE-1: Measures unigram (word-level) overlap.
ROUGE-2: Measures bigram (two-word sequence) overlap.
ROUGE-N: Measures n-gram overlap between the candidate and reference text.

We can implement a ROUGE metric that takes the parameter n and computes both the descriptor values (overlap) for each row and a summary ROUGE metric for the dataset.

Note that this implementation would require creating a new Metric (instead of defaulting to ColumnSummaryMetric to aggregate descriptors values) to compute and visualize the summary ROUGE score. You can check other dataset-level metrics (e.g., from classification or ranking) for inspiration.

piyushcse29 commented 15 hours ago

Hey @elenasamuylova , I am working on it.

piyushcse29 commented 15 hours ago

Screenshot 2024-10-02 at 02 45 59

Do we need to show text as well for the comparison or just score per row is enough along with summary score?

evidentlyai / evidently

Add a new `ROUGE` metric to Evidently #1318