Thanks for great repo.
I had convert hugginface model to paddle successfully, after that I had load model
from paddlenlp.transformers import LayoutXLMModel, LayoutXLMTokenizer
tokenizer = LayoutXLMTokenizer.from_pretrained("hugginface_model/phobert_paddle/model_state.pdparams")
print(tokenizer)
I get error:
[2022-12-23 15:42:45,311] [ WARNING] - file<https://bj.bcebos.com/paddlenlp/models/community//hugginface_model/phobert_paddle/model_state.pdparams/tokenizer_config.json> not exist
Traceback (most recent call last):
File "train_layoutml.py", line 22, in <module>
tokenizer = LayoutXLMTokenizer.from_pretrained("hugginface_model/phobert_paddle/model_state.pdparams")
File "/home/tupk/tupk/ocr-digits/PaddleNLP/paddlenlp/transformers/tokenizer_utils_base.py", line 1576, in from_pretrained
tokenizer = cls(*init_args, **init_kwargs)
File "/home/tupk/tupk/ocr-digits/PaddleNLP/paddlenlp/transformers/utils.py", line 170, in __impl__
init_func(self, *args, **kwargs)
File "/home/tupk/tupk/ocr-digits/PaddleNLP/paddlenlp/transformers/layoutxlm/tokenizer.py", line 93, in __init__
self.sp_model.Load(vocab_file)
File "/home/tupk/anaconda3/envs/ocr/lib/python3.8/site-packages/sentencepiece/__init__.py", line 905, in Load
return self.LoadFromFile(model_file)
File "/home/tupk/anaconda3/envs/ocr/lib/python3.8/site-packages/sentencepiece/__init__.py", line 310, in LoadFromFile
return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg)
TypeError: not a string
I think that the error come from that model convert to paddlenlp but not keep config, so get error.
After that, I think should load model from folder. I do:
from paddlenlp.transformers import LayoutXLMModel, LayoutXLMTokenizer
tokenizer = LayoutXLMTokenizer.from_pretrained("hugginface_model/phobert_paddle/")
print(tokenizer)
I get error:
/home/tupk/anaconda3/envs/ocr/lib/python3.8/site-packages/_distutils_hack/__init__.py:33: UserWarning: Setuptools is replacing distutils.
warnings.warn("Setuptools is replacing distutils.")
[2022-12-23 15:44:45,091] [ WARNING] - Detected that datasets module was imported before paddlenlp. This may cause PaddleNLP datasets to be unavalible in intranet. Please import paddlenlp before datasets module to avoid download issues
Traceback (most recent call last):
File "train_layoutml.py", line 22, in <module>
tokenizer = LayoutXLMTokenizer.from_pretrained("hugginface_model/phobert_paddle/")
File "/home/tupk/tupk/ocr-digits/PaddleNLP/paddlenlp/transformers/tokenizer_utils_base.py", line 1576, in from_pretrained
tokenizer = cls(*init_args, **init_kwargs)
File "/home/tupk/tupk/ocr-digits/PaddleNLP/paddlenlp/transformers/utils.py", line 170, in __impl__
init_func(self, *args, **kwargs)
TypeError: __init__() missing 1 required positional argument: 'vocab_file'
I am very thankfull for your help if can keep config when convert or load from local.
Thanks for great repo. I had convert hugginface model to paddle successfully, after that I had load model
I get error:
I think that the error come from that model convert to paddlenlp but not keep config, so get error. After that, I think should load model from folder. I do:
I get error:
I am very thankfull for your help if can keep config when convert or load from local.