ProsusAI / finBERT

Financial Sentiment Analysis with BERT
Apache License 2.0
1.45k stars 417 forks source link

Unidecode error when trying to load model saved locally #35

Closed Verena96 closed 3 years ago

Verena96 commented 3 years ago

Hello, I trained the model with my own parameters, and saved it. However, whenever I try to use it, I get the following error:

UnicodeDecodeError Traceback (most recent call last)

in 4 tokenizer = AutoTokenizer.from_pretrained("ProsusAI/finbert") 5 ----> 6 model = AutoModelForSequenceClassification.from_pretrained("C:/Users/Verena/Documents/finbert_new/models/classifier_model/finbert-sentiment.bin") 7 label_list = label_list=['positive','negative','neutral'] ~\anaconda3\envs\finbert\lib\site-packages\transformers-4.0.1-py3.8.egg\transformers\models\auto\modeling_auto.py in from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs) 1237 if not isinstance(config, PretrainedConfig): 1238 config, kwargs = AutoConfig.from_pretrained( -> 1239 pretrained_model_name_or_path, return_unused_kwargs=True, **kwargs 1240 ) 1241 ~\anaconda3\envs\finbert\lib\site-packages\transformers-4.0.1-py3.8.egg\transformers\models\auto\configuration_auto.py in from_pretrained(cls, pretrained_model_name_or_path, **kwargs) 339 {'foo': False} 340 """ --> 341 config_dict, _ = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs) 342 343 if "model_type" in config_dict: ~\anaconda3\envs\finbert\lib\site-packages\transformers-4.0.1-py3.8.egg\transformers\configuration_utils.py in get_config_dict(cls, pretrained_model_name_or_path, **kwargs) 387 ) 388 # Load config dict --> 389 config_dict = cls._dict_from_json_file(resolved_config_file) 390 391 except EnvironmentError as err: ~\anaconda3\envs\finbert\lib\site-packages\transformers-4.0.1-py3.8.egg\transformers\configuration_utils.py in _dict_from_json_file(cls, json_file) 470 def _dict_from_json_file(cls, json_file: str): 471 with open(json_file, "r", encoding="utf-8") as reader: --> 472 text = reader.read() 473 return json.loads(text) 474 ~\anaconda3\envs\finbert\lib\codecs.py in decode(self, input, final) 320 # decode input (taking the buffer into account) 321 data = self.buffer + input --> 322 (result, consumed) = self._buffer_decode(data, self.errors, final) 323 # keep undecoded input until the next call 324 self.buffer = data[consumed:] UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte The same happens when I try to load the language model, even though both models are downloaded locally. I was only able to use finbert through transformers. Can you please help me? Thanks!
doguaraci commented 3 years ago

Hi, the problem is that you're giving the .bin file as the input in .from_pretrained part.

There should be a folder with the model weights (.bin file) and config.json, and you should give the folder name as the input.