TharinduDR / TransQuest

Transformer based translation quality estimation
Apache License 2.0
107 stars 16 forks source link

[BUG] Tokenizer files missing for TransQuest/monotransquest-hter-en_any when using huggingface transformers #35

Closed varvarvarvar closed 2 years ago

varvarvarvar commented 2 years ago

Describe the bug

Thanks for maintaining the library.

I cannot download tokenizer files for TransQuest/monotransquest-hter-en_any when using huggingface transformers library.

To Reproduce

from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("TransQuest/monotransquest-hter-en_any")
  File "/Users/user/Library/Caches/pypoetry/virtualenvs/venv/lib/python3.8/site-packages/transformers/tokenization_utils_fast.py", line 108, in __init__
    fast_tokenizer = TokenizerFast.from_file(fast_tokenizer_file)
Exception: No such file or directory (os error 2)

Expected behaviour

Pretrained tokenizer is instantiated.

Desktop (please complete the following information):

Additional context

Downloading pretrained model worked fine:

from transformers import AutoModelForSequenceClassification

model = AutoModelForSequenceClassification.from_pretrained("TransQuest/monotransquest-hter-en_any")

Instantiating the model through transquest library also worked:

import torch
from transquest.algo.sentence_level.monotransquest.run_model import MonoTransQuestModel

model = MonoTransQuestModel("xlmroberta", "TransQuest/monotransquest-hter-en_any", num_labels=1, use_cuda=torch.cuda.is_available())
predictions, raw_outputs = model.predict([["Reducerea acestor conflicte este importantă pentru conservare.", "Reducing these conflicts is not important for preservation."]])
print(predictions)
stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.