This PR implements the following changes to GeneralSemanticRobustness eval:
To complement the Word Error Rate metric that measures syntactic differences, we add the BERTScore Dissimilarity metric that measures semantic differences. We use BERTScore Dissimilarity = 1 - BERTScore (a dissimilarity metric) instead of BERTScore (a similarity metric). We use dissimilarity to be consistent with Word Error Rate and the rest of SemanticRobustness evals that measure dissimilarities.
We normalize the BERTScore Dissimilarity and Word Error Rate when the model is non-deterministic.
Description of changes:
This PR implements the following changes to
GeneralSemanticRobustness
eval:Word Error Rate
metric that measures syntactic differences, we add theBERTScore Dissimilarity
metric that measures semantic differences. We useBERTScore Dissimilarity = 1 - BERTScore
(a dissimilarity metric) instead ofBERTScore
(a similarity metric). We use dissimilarity to be consistent withWord Error Rate
and the rest ofSemanticRobustness
evals that measure dissimilarities.BERTScore Dissimilarity
andWord Error Rate
when the model is non-deterministic.