aws / fmeval

Foundation Model Evaluations Library
http://aws.github.io/fmeval
Apache License 2.0
187 stars 42 forks source link

feat: add support for non-deterministic models in GeneralSemanticRobustness and add BERTScore Dissimilarity #184

Closed bilalaws closed 7 months ago

bilalaws commented 7 months ago

Description of changes:

This PR implements the following changes to GeneralSemanticRobustness eval:

  1. To complement the Word Error Rate metric that measures syntactic differences, we add the BERTScore Dissimilarity metric that measures semantic differences. We use BERTScore Dissimilarity = 1 - BERTScore (a dissimilarity metric) instead of BERTScore (a similarity metric). We use dissimilarity to be consistent with Word Error Rate and the rest of SemanticRobustness evals that measure dissimilarities.
  2. We normalize the BERTScore Dissimilarity and Word Error Rate when the model is non-deterministic.