coqui-ai / STT

🐸STT - The deep learning toolkit for Speech-to-Text. Training and deploying STT models has never been so easy.
https://coqui.ai
Mozilla Public License 2.0
2.23k stars 270 forks source link

Updated missing checkpoint path error message #2330

Closed wasertech closed 1 year ago

wasertech commented 1 year ago

Addresses https://github.com/coqui-ai/STT/issues/2329

generate_scorer_package --lm /mnt/lm/lm.binary --vocab /mnt/lm/vocab-500000.txt --package /mnt/lm/kenlm.scorer --default_alpha 0 --default_beta 0
500000 unique words read from vocabulary file.
Doesn't look like a character based (Bytes Are All You Need) model.
--force_bytes_output_mode was not specified, using value infered from vocabulary contents: false
No --checkpoint path specified, not using bytes output mode, can't continue.
Checkpoint path must contain an alphabet.
Start by creating an alphabet for your models using coqui_stt_training.util.check_characters if needed.

    python -m coqui_stt_training.util.check_characters \
                                --csv-files ... \
                                --alphabet-format | grep -v '^#' | sort -n > models/alphabet.txt

This will create an alphabet models/alphabet.txt.
Now rerun this script by giving models/ as the checkpoint path.

    generate_scorer_package  \
                --checkpoint models/ \
                ...

Full logs

EDIT: I changed the message a bit (mainly typos fixes) with this commit https://github.com/coqui-ai/STT/commit/a694187be4817870e53f5e14b24e16b57dfaa581

wasertech commented 1 year ago

https://github.com/coqui-ai/STT/actions/runs/3863367591/jobs/6585517337#step:7:120 For my defense, it's not my fault if we cannot tap homebrew/core because of an invalid syntax in tap!

wasertech commented 1 year ago

I've updated the code to be more readable and tested it by building generate_scorer_package locally. You can checkout my logs here. I'll merge this PR now as functionally it works as expected. If you want to improve or change the error message let me know.