AIPHES / emnlp19-moverscore

MoverScore: Text Generation Evaluating with Contextualized Embeddings and Earth Mover Distance
MIT License
192 stars 31 forks source link

Can use this lib in Chinese? #15

Open hueiyuan opened 3 years ago

hueiyuan commented 3 years ago

I want to check something. I have viewed source code and found it use DistillBert which use "distilbert-base-uncased". I want to ask this lib if can be used in chinese language? Thanks

andyweizhao commented 3 years ago

Thanks a lot for your interest! Yes, you can specify Chinese BERT (e.g., bert-base-chinese) as the model_name. Note that this project is designed for measuring the similarity of monolingual texts. If you are of interest in multilingual texts (e.g., the similarity between Chinese and English texts), please refer to our recent project in https://github.com/AIPHES/ACL20-Reference-Free-MT-Evaluation, where we made some modification to get better results in the multilingual evaluation context.

hueiyuan commented 3 years ago

@andyweizhao Understood! but how to specify Chinese BERT (e.g., bert-base-chinese) as the model_name with this lib? I have not seen this parameter setting in the source code. Thanks for help.

xhluca commented 2 years ago

@hueiyuan It is now specified in the readme:

import os 
os.environ['MOVERSCORE_MODEL'] = "albert-base-v2"

from moverscore_v2 import get_idf_dict
idf_dict_hyp = get_idf_dict(translations)

Here the model would be the bert model you want to use.