benob / recasepunc

Model for recasing and repunctuating ASR transcripts
BSD 3-Clause "New" or "Revised" License
129 stars 20 forks source link

RuntimeError when predicting with the french models #9

Open maelchiotti opened 2 years ago

maelchiotti commented 2 years ago

I tried to use the french models (both fr.22000 and fr-txt.large.19000) on a very simple text:

j'aime les fleurs les olives et la raclette

When running python3 recasepunc.py predict fr.22000 < input.txt > output.txt (or with the other model), I get the following RuntimeError:

Traceback (most recent call last):
  File "/home/mael/charly/recasepunc/recasepunc.py", line 733, in <module>
    main(config, config.action, config.action_args)
  File "/home/mael/charly/recasepunc/recasepunc.py", line 707, in main
    generate_predictions(config, *args)
  File "/home/mael/charly/recasepunc/recasepunc.py", line 336, in generate_predictions
    model.load_state_dict(loaded['model_state_dict'])
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1497, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for Model:
    Unexpected key(s) in state_dict: "bert.position_ids".

I tried the same with the english model, and it worked perfectly. Looks like something is broken with the french ones?

benob commented 2 years ago

Did you install the version of transformers from requirements? This kind of error is typical of a mismatch in Bert model class structure.

maelchiotti commented 2 years ago

That was indeed the issue, and installing the required version of transformers fixed it. Thanks a lot!

I may as well add that I did not use requirements.txt because, when running pip3 install -r requirements.txt -f https://download.pytorch.org/whl/torch_stable.html, I got the following error:

ERROR: Could not find a version that satisfies the requirement torch==1.9.0+cu111 (from versions: 1.11.0, 1.11.0+cpu, 1.11.0+cu102, 1.11.0+cu113, 1.11.0+cu115, 1.11.0+rocm4.3.1, 1.11.0+rocm4.5.2, 1.12.0, 1.12.0+cpu, 1.12.0+cu102, 1.12.0+cu113, 1.12.0+cu116, 1.12.0+rocm5.0, 1.12.0+rocm5.1.1)
Breizhux commented 1 year ago

I had the same problem for the French language.

I tested several types of configuration files. With the following content as requirements.txt file, the installation is done without problem:

git+https://github.com/benob/mosestokenizer.git
numpy
regex
torch
tqdm
transformers==4.10.0

Then, I can install recasepunc with simple pip install -r requirements.txt.

The version of each library in my current installation is :

I would add that the pytorch version required in the original requirements.txt file crashes my computers that don't have enough RAM (8GB + 6GB swap). The latest version doesn't seem to exceed the 3GB of Ram used for the installation.

With this requirements.txt file the installation works very well (I have only tested the French models and the English model...) ! Maybe it deserves a commit ;)