alphacep / vosk-api

Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
Apache License 2.0
7.37k stars 1.04k forks source link

Can't get attribute 'Trie' using punctuation RU model #1459

Closed Lorenzoncina closed 6 months ago

Lorenzoncina commented 8 months ago

Hello. I'm trying to use the puncutation and case restoration model trained using https://github.com/benob/recasepunc and dowloaded here: https://alphacephei.com/vosk/models/vosk-recasepunc-ru-0.22.zip

I can use the English model but the russian and german ones fails when running the following command:

python3 recasepunc.py predict checkpoint < de-test.txt > output.txt

I get the following error:

python3 ../../recasepunc/recasepunc.py predict checkpoint < de-test.txt > output.txt
Traceback (most recent call last):
  File "../../recasepunc/recasepunc.py", line 752, in <module>
    main(config, config.action, config.action_args)
  File "../../recasepunc/recasepunc.py", line 723, in main
    generate_predictions(config, *args)
  File "../../recasepunc/recasepunc.py", line 346, in generate_predictions
    loaded = torch.load(checkpoint_path, map_location=config.device if torch.cuda.is_available() else 'cpu')
  File "/raid/data/s2t/speech_tools/recasepunc/lib/python3.8/site-packages/torch/serialization.py", line 607, in load
    return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
  File "/raid/data/s2t/speech_tools/recasepunc/lib/python3.8/site-packages/torch/serialization.py", line 882, in _load
    result = unpickler.load()
  File "/raid/data/s2t/speech_tools/recasepunc/lib/python3.8/site-packages/torch/serialization.py", line 875, in find_class
    return super().find_class(mod_name, name)
AttributeError: Can't get attribute 'Trie' on <module 'transformers.tokenization_utils' from '/raid/data/s2t/speech_tools/recasepunc/lib/python3.8/site-packages/transformers/tokenization_utils.py'>

It seems a dependency issue, maybe the version of some package I've installed in my eviroment is outdated. Anyone has an idea?

nshmyrev commented 8 months ago

You need transformers==4.25.1, they had Trie there:

https://github.com/huggingface/transformers/blob/v4.25.1/src/transformers/tokenization_utils.py#L51

other version might be broken

nshmyrev commented 6 months ago

Let us know if you have other questions