dimitristaufer / Semi-Automated-Text-Sanitization

Mitigating the Risk of Whistleblower Re-Identification with Semi-automated Text Sanitization
0 stars 0 forks source link

Issue with tokenizer #1

Closed groemi closed 1 week ago

groemi commented 2 weeks ago

I like the project and wanted to try it out. Unfortunately, I encountered the following error when starting the Docker Compose file:

Setting model to: Backend/chatgpt_paraphrases_out_100000_xl
web-1  | Traceback (most recent call last):
web-1  |   File "/app/Backend/server.py", line 10, in <module>
web-1  |     import utils
web-1  |   File "/app/Backend/utils.py", line 195, in <module>
web-1  |     init_models()
web-1  |   File "/app/Backend/utils.py", line 183, in init_models
web-1  |     set_language_model(default_model)
web-1  |   File "/app/Backend/utils.py", line 121, in set_language_model
web-1  |     tokenizer = AutoTokenizer.from_pretrained(
web-1  |   File "/usr/local/lib/python3.9/site-packages/transformers/models/auto/tokenization_auto.py", line 768, in from_pretrained
web-1  |     return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
web-1  |   File "/usr/local/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 2024, in from_pretrained
web-1  |     return cls._from_pretrained(
web-1  |   File "/usr/local/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 2056, in _from_pretrained
web-1  |     slow_tokenizer = (cls.slow_tokenizer_class)._from_pretrained(
web-1  |   File "/usr/local/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 2256, in _from_pretrained
web-1  |     tokenizer = cls(*init_inputs, **init_kwargs)
web-1  |   File "/usr/local/lib/python3.9/site-packages/transformers/models/t5/tokenization_t5.py", line 166, in __init__
web-1  |     self.sp_model.Load(vocab_file)
web-1  |   File "/usr/local/lib/python3.9/site-packages/sentencepiece/__init__.py", line 905, in Load
web-1  |     return self.LoadFromFile(model_file)
web-1  |   File "/usr/local/lib/python3.9/site-packages/sentencepiece/__init__.py", line 310, in LoadFromFile
web-1  |     return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg)
web-1  | TypeError: not a string

Do you have any idea where this might come from and how it can be fixed? Thank you in advance!

dimitristaufer commented 1 week ago

Hi @groemi glad to see that you seem to have figured it out. Feel free to contact me if you have any further questions directly under staufer@tu-berlin.de. I'm curious to know what you're planning to use the project for.