coqui-ai / STT

🐸STT - The deep learning toolkit for Speech-to-Text. Training and deploying STT models has never been so easy.
https://coqui.ai
Mozilla Public License 2.0
2.23k stars 270 forks source link

Update `genrate_scorer_package` error message when not given any `checkpoint` #2329

Closed wasertech closed 1 year ago

wasertech commented 1 year ago

Hi, sorry for the late reply. Checking to the --checkpoint flag indeed helped me out. (I was previously ignoring that option because I didn't have any checkpoint files and the language model itself is passed separately so it felt like it didn't apply)

So there's no bug indeed. Note however that the error message you get when using --force_bytes_output_mode off without passing the checkpoint option is not very helpful:

No --alphabet file specified, not using bytes output mode, can't continue.

How about "No alphabet file found and bytes output mode is off, can't continue. Did you pass a checkpoint directory?"

Originally posted by @poohsen in https://github.com/coqui-ai/STT/issues/2327#issuecomment-1373400498

Replace native_client/generate_scorer_package.cpp#L54: https://github.com/coqui-ai/STT/blob/2c81b0469a5cba2b27ea98e098adf53903fbc45d/native_client/generate_scorer_package.cpp#L54

With something like:

No --checkpoint path specified, not using bytes output mode, can't continue.
Checkpoint path must contains an alphabet.
Start by creating an alphabet using coqui_stt_training.util.check_characters if needed.

    python -m coqui_stt_training.util.check_characters \
                --csv-files ... \
                --alphabet-format | grep -v '^#' | sort -n > /mnt/models/alphabet.txt

This will create an alphabet `/mnt/models/alphabet.txt`.
Now rerun this script by giving `/mnt/models/` as the checkpoint path.

    generate_scorer_package  \
        --checkpoint /mnt/models/ \
        --lm /mnt/lm/lm.binary \
        --vocab /mnt/lm/vocab-${LM_TOP_K}.txt \
        --package /mnt/lm/kenlm.scorer \
        --default_alpha .. \
        --default_beta ..
wasertech commented 1 year ago

Fix with #2330