aws / fmeval

Foundation Model Evaluations Library
http://aws.github.io/fmeval
Apache License 2.0
214 stars 46 forks source link

[Feature] Increase model coverage of the BERT Score metric by adding torchmetrics implementation #332

Open achad4 opened 3 weeks ago

achad4 commented 3 weeks ago

The current underlying implementation of BERT score supports a limited set of transformer models, and FMEval further truncates this list to microsoft/deberta-xlarge-mnli and roberta-large-mnli.

Torchmetrics provides a more generic BERT score implementation. The specific request here is to not limit what transformer models users can configure. The behavior should be as follows:

Use cases for broader set of models underlying BERT score:

  1. Monolingual BERT models have been shown to outperform multi-lingual BERT models on certain tasks. https://aclanthology.org/2021.acl-long.243.pdf
  2. Customers may fine tune their own transformers which can be downloaded into the container running AWSFMeval and passed into the torchmetrics BERT score implementation.