BERT Score assesses the quality of generated text by comparing it to a reference or a ground truth text. It considers the semantic similarity between the generated text and the reference text. BERT Score leverages BERT's contextual embeddings to capture the meaning and context of words and phrases in the text, which makes it more effective than traditional metrics like BLEU or ROUGE, which rely on n-grams and may not capture the nuances of language.
It will come under the ModelBasedMetric Category as defined on issue #238
By definition, we can define BERT score as:
It will come under the ModelBasedMetric Category as defined on issue #238