Torchmetrics provides a more generic BERT score implementation. The specific request here is to not limit what transformer models users can configure. The behavior should be as follows:
If microsoft/deberta-xlarge-mnli and roberta-large-mnli is specified, use the bert-score implementation to avoid regressions for existing customers.
Otherwise, use the torchmetrics implementation of BERT score.
Use cases for broader set of models underlying BERT score:
Customers may fine tune their own transformers which can be downloaded into the container running AWSFMeval and passed into the torchmetrics BERT score implementation.
The current underlying implementation of BERT score supports a limited set of transformer models, and FMEval further truncates this list to
microsoft/deberta-xlarge-mnli
androberta-large-mnli
.Torchmetrics provides a more generic BERT score implementation. The specific request here is to not limit what transformer models users can configure. The behavior should be as follows:
microsoft/deberta-xlarge-mnli
androberta-large-mnli
is specified, use thebert-score
implementation to avoid regressions for existing customers.torchmetrics
implementation of BERT score.Use cases for broader set of models underlying BERT score:
torchmetrics
BERT score implementation.