Closed poohsen closed 1 year ago
You are probably using an older example here. The --alphabet flag in generate_scorer_package.py is replaced with the --checkpoint flag. Actually, it does not rely on checkpoint data, but the checkpoint directory contains the alphabet and it uses it.
Please see here: https://stt.readthedocs.io/en/latest/playbook/SCORER.html
Closing as it’s not an issue and @HarikalarKutusu pointed out the error in op’s command flow
Hi, sorry for the late reply. Checking to the --checkpoint
flag indeed helped me out. (I was previously ignoring that option because I didn't have any checkpoint files and the language model itself is passed separately so it felt like it didn't apply)
So there's no bug indeed. Note however that the error message you get when using --force_bytes_output_mode off
without passing the checkpoint option is not very helpful:
No --alphabet file specified, not using bytes output mode, can't continue.
How about "No alphabet file found and bytes output mode is off, can't continue. Did you pass a checkpoint directory?"
... the error message you get when using --force_bytes_output_mode off without passing the checkpoint option is not very helpful ...
I've updated the error message like so:
generate_scorer_package --lm /mnt/lm/lm.binary --vocab /mnt/lm/vocab-500000.txt --package /mnt/lm/kenlm.scorer --default_alpha 0 --default_beta 0
500000 unique words read from vocabulary file.
Doesn't look like a character based (Bytes Are All You Need) model.
--force_bytes_output_mode was not specified, using value infered from vocabulary contents: false
No --checkpoint path specified, not using bytes output mode, can't continue.
Checkpoint path must contain an alphabet.
Start by creating an alphabet for your models using coqui_stt_training.util.check_characters if needed.
python -m coqui_stt_training.util.check_characters \
--csv-files ... \
--alphabet-format | grep -v '^#' | sort -n > models/alphabet.txt
This will create an alphabet models/alphabet.txt.
Now rerun this script by giving models/ as the checkpoint path.
generate_scorer_package \
--checkpoint models/ \
...
It's already on main
but won't be introduced into the stable code base before version 1.5.0.
For those who want this patch early, you'll need to build generate_scorer_package
manually since we pull the pre-built binary file from the latest release.
Checkout the docs to build binaries or this comment I made under my logs for #2330 which introduced the reprog.
Welcome to the 🐸STT project! We are excited to see your interest, and appreciate your support!
This repository is governed by the Contributor Covenant Code of Conduct. For more details, see the CODE_OF_CONDUCT.md file.
If you've found a bug, please provide the following information:
Describe the bug Running
./generate_scorer_package --alphabet
results in this error:yet running it with
--force_bytes_output_mode off
refers to that very flag being required:To Reproduce Steps to reproduce the behavior:
./generate_scorer_package --alphabet
Expected behavior As with older versions, the --alphabet option should be accepted and made use of.
Environment (please complete the following information):
Additional context
generate_scorer_package --help
also doesn't mention the--alphabet
option either, even though the generate_scorer_package.cpp code does list it as an option in its main method.