BPYap / BERT-WSD

[EMNLP-Findings 2020] Adapting BERT for Word Sense Disambiguation with Gloss Selection Objective and Example Sentences
https://arxiv.org/abs/2009.11795
62 stars 12 forks source link

Multilingual model doesn't work #1

Closed JamesArthurHolland closed 3 years ago

JamesArthurHolland commented 3 years ago

Hi,

I'm trying to use BERT-Base, Multilingual Cased from google-research

But it's taking very long to load. I suspect an infinite loop.

The files in the model folder have different names than the models available from BERT-WSD.

How can I make this other model compatible? I need multilingual support.

BPYap commented 3 years ago

Hi James,

Thank you for your interest in our work. You can access the pretrained multilingual model by passing the ID bert-base-multilingual-cased to the --model_name_or_path argument. This will download a Pytorch version of the model from the HuggingFace model repository.

However, as a side note, you might want to fine-tune it on some multilingual WSD datasets other than SemCor for it to work well under multilingual settings.

JamesArthurHolland commented 3 years ago

Hi BPYap,

Thanks for your speedy response.

Where does it download the model to? The model folder?

Can I pass the id to the demo_model script? I'm currently trying to do that but it seems to stall at "loading the model", with nothing appearing in the model folder.

BPYap commented 3 years ago

The model is downloaded to the system's cache directory, in my case (Windows 10) it was C:\Users\<your_username>\.cache\torch\transformers\.

About the demo_model script, apparently it does work for model ID (though not intended), here's my console output:

python script\demo_model.py "bert-base-multilingual-cased"
To use data.metrics please install scikit-learn. See https://scikit-learn.org/stable/index.html
Loading model...

Enter a sentence with an ambiguous word surrounded by [TGT] tokens
> He caught a [TGT] bass [TGT] yesterday.
Progress: 100%|██████████████████████████████████████████████████████████████████████████| 9/9 [00:00<00:00, 13.14it/s]

Predictions:
  No.  Sense key            Definition                                                                                             Score
-----  -------------------  ---------------------------------------------------------------------------------------------------  -------
    1  bass%1:18:00::       an adult male singer with the lowest voice                                                           0.11432
    2  bass%1:10:01::       the lowest part in polyphonic music                                                                  0.11414
    3  bass%1:10:00::       the lowest adult male singing voice                                                                  0.11288
    4  bass%5:00:00:low:03  having or denoting a low vocal or instrumental range                                                 0.11221
    5  bass%1:07:01::       the lowest part of the musical range                                                                 0.11164
    6  bass%1:06:02::       the member with the lowest range of a family of musical instruments                                  0.10972
    7  bass%1:13:02::       the lean flesh of a saltwater fish of the family Serranidae                                          0.10837
    8  bass%1:05:00::       nontechnical name for any of numerous edible marine and freshwater spiny-finned fishes               0.10836
    9  bass%1:13:01::       any of various North American freshwater fish with lean flesh (especially of the genus Micropterus)  0.10836

Enter a sentence with an ambiguous word surrounded by [TGT] tokens
>

Might need to give it a few minutes for it to be downloaded completely, it appears stuck because the download progress bar is not being displayed for some reasons.

JamesArthurHolland commented 3 years ago

Thanks that worked