SapienzaNLP / ewiser

A Word Sense Disambiguation system integrating implicit and explicit external knowledge.
Other
66 stars 17 forks source link

multilingual datasets and model #2

Closed lwmlyy closed 3 years ago

lwmlyy commented 3 years ago

Hi, nice work there. Could you please detail the multilingual dataset version ('all' or 'wn') and also the multilingual BERT version (base or large)?

mbevila commented 3 years ago

We used the 'wn' version. The BERT model is the 'bert-base-multilingual-cased' one.

lwmlyy commented 3 years ago

I see, thx. I wonder if there is any special reason why the large model is not used.

mbevila commented 3 years ago

There wasn't a BERT multilingual large model when the experiments were performed. (Not sure there is one now). If you need a stronger multilingual model, you can use XLM-R, it is trivial to train one with our code. I could release a pre-trained checkpoint someday, if I see interest in this.

lwmlyy commented 3 years ago

OK, I might try it myself with XLM-R. Thanks.