Closed chicuong209 closed 4 years ago
Please could you provide more details (data, scripts, .... as much as you can) ? Probably you saved a dictionary and then tried to reload it?
Please could you provide more details (data, scripts, .... as much as you can) ? Probably you saved a dictionary and then tried to reload it?
Then you should skip step 2.
Download the dictionary and bpe files from https://huggingface.co/vinai/phobert-base#list-files
and load the tokenizer using: tokenizer=PhoBertTokenizer(path-to-dictionay-file, path-to-bpe-file)
How come you'd need to save and reload the dictionary ? It's pretty weird :|
Then you should skip step 2. Download the dictionary and bpe files from
https://huggingface.co/vinai/phobert-base#list-files
and load the tokenizer using:tokenizer=PhoBertTokenizer(path-to-dictionay-file, path-to-bpe-file)
ok. I'll try it now
I have same issue with him. PhoBert model is ok but tokenizer was not found. The error is as below: OSError: Model name 'vinai/phobert-base' was not found in tokenizers model name list (roberta-base, roberta-large, roberta-large-mnli, distilroberta-base, roberta-base-openai-detector, roberta-large-openai-detector). We assumed 'vinai/phobert-base' was a path, a model identifier, or url to a directory containing vocabulary files named ['vocab.json', 'merges.txt'] but couldn't find such vocabulary files at this path or url. I think many people will meet this issue so I post it here :D thanks for your kindly response :D
Please install transformers
from its latest source:
git clone https://github.com/huggingface/transformers.git
cd transformers
pip3 install --upgrade .
And also clean/remove your transformers
folder in ~/.cache/torch
, so it'd automatically re-download PhoBERT properly. It should work.
Please install
transformers
from its latest source:git clone https://github.com/huggingface/transformers.git cd transformers pip3 install --upgrade .
And also clean/remove your
transformers
folder in~/.cache/torch
, so it'd automatically re-download PhoBERT properly. It should work.
@chicuong209 if there is any problem, you might want to follow the above instruction. I'm pretty sure PhoBERT would work without any loading issue.
Please install
transformers
from its latest source:git clone https://github.com/huggingface/transformers.git cd transformers pip3 install --upgrade .
And also clean/remove your
transformers
folder in~/.cache/torch
, so it'd automatically re-download PhoBERT properly. It should work.
Thank you, it works for me.
Please install
transformers
from its latest source:git clone https://github.com/huggingface/transformers.git cd transformers pip3 install --upgrade .
And also clean/remove your
transformers
folder in~/.cache/torch
, so it'd automatically re-download PhoBERT properly. It should work.Thank you, it works for me. The problem is I think it will be download PhoBERT automatically when I run command to install transformers from pip.
As the title. I meet the below error when using PhoBertTokenizer for Vietnamese Question Answering task. Could you please help me to fix it ? Thank you.
f"Non-consecutive added token '{token}' found. " AssertionError: Non-consecutive added token '<mask>' found. Should have index 5 but has index 64000 in saved vocabulary.
Btw, i have tried to setself.encoder[self.mask_token] = 4,
the training process can run normally, but it doesn't seem a right way.