NormXU / ERNIE-Layout-Pytorch

An unofficial Pytorch implementation of ERNIE-Layout which is originally released through PaddleNLP.
http://arxiv.org/abs/2210.06155
MIT License
99 stars 11 forks source link

List index out of range #1

Closed karndeb closed 1 year ago

karndeb commented 1 year ago

Hi NormXU, I am getting this error.

Traceback (most recent call last):
  File "inference.py", line 16, in <module>
    tokenizer = ErnieLayoutTokenizer.from_pretrained(pretrained_model_name_or_path=pretrain_torch_model_or_path)
  File "D:\startup-resources\ERNIE-Layout-Pytorch\Ernie-Layout\lib\site-packages\transformers\tokenization_utils_base.py", line 1708, in from_pretrained
    file_id = list(cls.vocab_files_names.keys())[0]
IndexError: list index out of range

Please help

NormXU commented 1 year ago

Hi, sorry for the late reply;

I guess it is because you forget to declare vocab_file path in tokenizer_config.yml

The tokenizer_config.yml looks like this

{
  "do_tokenize_postprocess": false,
  "sep_token": "[SEP]",
  "cls_token": "[CLS]",
  "unk_token": "[UNK]",
  "pad_token": "[PAD]",
  "mask_token": "[MASK]",
  "do_lower_case": true,
  "model_max_length": 512,
  "vocab_file": "/path/to/vocab.txt",  # here you need to replace with your path to vocab.txt
  "sentencepiece_model_file": "/path/to/sentencepiece.bpe.model" # here you need to replace with your path to sentencepiece.bpe.model
}
NormXU commented 1 year ago

I have update the repo and corresponding tokenizer_config.yml at model hub, please pull the latest version and try again