It seems that the default values of keyword arguments in Huggingface's BERTScore API do not give the best of BERTScore.
idf: By default, it is off. We should probably turn it on. See "Importance Weighting" on page 4 of BERTScore paper However, since we use the same setting for both traditional and new approach, I am not sure whether it matters.
model_type: Default language model is roberta-large when lang=en. According to BERTScore's lead board, other models have higher correlation with human ratings. However, since we use the same language model for both traditional/ref-based and new/DocAsRef approach, I am not sure whether it matters.
use_fast_tokenizer. Default is off. Please turn on to speed up. Huggingface's fast tokenizer is implemented in Rust instead of Python.
It seems that the default values of keyword arguments in Huggingface's BERTScore API do not give the best of BERTScore.
idf
: By default, it is off. We should probably turn it on. See "Importance Weighting" on page 4 of BERTScore paper However, since we use the same setting for both traditional and new approach, I am not sure whether it matters.model_type
: Default language model isroberta-large
whenlang=en
. According to BERTScore's lead board, other models have higher correlation with human ratings. However, since we use the same language model for both traditional/ref-based and new/DocAsRef approach, I am not sure whether it matters.use_fast_tokenizer
. Default is off. Please turn on to speed up. Huggingface's fast tokenizer is implemented in Rust instead of Python.@NKWBTB @lihebi Let me know your thoughts.