evidentlyai / evidently

Evidently is ​​an open-source ML and LLM observability framework. Evaluate, test, and monitor any AI-powered system or data pipeline. From tabular data to Gen AI. 100+ metrics.
https://www.evidentlyai.com/evidently-oss
Apache License 2.0
5.2k stars 581 forks source link

Add a new `ROUGE` metric to Evidently #1318

Open elenasamuylova opened 1 week ago

elenasamuylova commented 1 week ago

About Hacktoberfest contributions: https://github.com/evidentlyai/evidently/wiki/Hacktoberfest-2024

Description

The ROUGE (Recall-Oriented Understudy for Gisting Evaluation) metric evaluates the quality of a generated text by comparing it to a reference text (typically a summary). It measures how much of the reference text is covered by the generated summary through n-gram overlap. Several common ROUGE variants exist:

We can implement a ROUGE metric that takes the parameter n and computes both the descriptor values (overlap) for each row and a summary ROUGE metric for the dataset.

Note that this implementation would require creating a new Metric (instead of defaulting to ColumnSummaryMetric to aggregate descriptors values) to compute and visualize the summary ROUGE score. You can check other dataset-level metrics (e.g., from classification or ranking) for inspiration.

piyushcse29 commented 15 hours ago

Hey @elenasamuylova , I am working on it.

piyushcse29 commented 15 hours ago

Screenshot 2024-10-02 at 02 45 59

Do we need to show text as well for the comparison or just score per row is enough along with summary score?