[Feature] Increase model coverage of the BERT Score metric by adding torchmetrics implementation

The current underlying implementation of BERT score supports a limited set of transformer models, and FMEval further truncates this list to microsoft/deberta-xlarge-mnli and roberta-large-mnli.

Torchmetrics provides a more generic BERT score implementation. The specific request here is to not limit what transformer models users can configure. The behavior should be as follows:

If microsoft/deberta-xlarge-mnli and roberta-large-mnli is specified, use the bert-score implementation to avoid regressions for existing customers.
Otherwise, use the torchmetrics implementation of BERT score.

Use cases for broader set of models underlying BERT score:

Monolingual BERT models have been shown to outperform multi-lingual BERT models on certain tasks. https://aclanthology.org/2021.acl-long.243.pdf
Customers may fine tune their own transformers which can be downloaded into the container running AWSFMeval and passed into the torchmetrics BERT score implementation.

aws / fmeval

[Feature] Increase model coverage of the BERT Score metric by adding torchmetrics implementation #332