coqui-ai / TTS

🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
http://coqui.ai
Mozilla Public License 2.0
33.43k stars 4.06k forks source link

[Bug?] TTS of "10. 9. 8. 7. 6. 5. 4. 3. 2. 1. Finished" seems to clog the system #3972

Open thomasf1 opened 1 month ago

thomasf1 commented 1 month ago

Describe the bug

Trying to get TTS to do a countdown, but it seems to run forever, when a similar prompt seems to run in a reasonable time

Works as expected: tts --text "How is the weather today?" --model_name "tts_models/en/ek1/tacotron2" --out_path test2.wav

Runs forever on my system: tts --text "10. 9. 8. 7. 6. 5. 4. 3. 2. 1. Finished" --model_name "tts_models/en/ek1/tacotron2" --out_path test3.wav

To Reproduce

run tts --text "10. 9. 8. 7. 6. 5. 4. 3. 2. 1. Finished" --model_name "tts_models/en/ek1/tacotron2" --out_path test3.wav

Expected behavior

Reasonable execution time

Logs

tts --text "10. 9. 8. 7. 6. 5. 4. 3. 2. 1. Finished" --model_name "tts_models/en/ek1/tacotron2" --out_path test3.wav

 > tts_models/en/ek1/tacotron2 is already downloaded.
 > vocoder_models/en/ek1/wavegrad is already downloaded.
 > Using model: Tacotron2
 > Setting up Audio Processor...
 | > sample_rate:22050
 | > resample:False
 | > num_mels:80
 | > log_func:np.log10
 | > min_level_db:-10
 | > frame_shift_ms:None
 | > frame_length_ms:None
 | > ref_level_db:0
 | > fft_size:1024
 | > power:1.8
 | > preemphasis:0.99
 | > griffin_lim_iters:60
 | > signal_norm:True
 | > symmetric_norm:True
 | > mel_fmin:0
 | > mel_fmax:8000.0
 | > pitch_fmin:1.0
 | > pitch_fmax:640.0
 | > spec_gain:1.0
 | > stft_pad_mode:reflect
 | > max_norm:4.0
 | > clip_norm:True
 | > do_trim_silence:True
 | > trim_db:60
 | > do_sound_norm:False
 | > do_amp_to_db_linear:True
 | > do_amp_to_db_mel:True
 | > do_rms_norm:False
 | > db_level:None
 | > stats_path:None
 | > base:10
 | > hop_length:256
 | > win_length:1024
 > Model's reduction rate `r` is set to: 2
 > Vocoder Model: wavegrad
 > Text: 10. 9. 8. 7. 6. 5. 4. 3. 2. 1. Finished
 > Text splitted to sentences.
['10. 9.', '8.', '7.', '6.', '5.', '4.', '3.', '2.', '1.', 'Finished']
(still running)

Environment

{
    "CUDA": {
        "GPU": [],
        "available": false,
        "version": null
    },
    "Packages": {
        "PyTorch_debug": false,
        "PyTorch_version": "2.1.2",
        "TTS": "0.22.0",
        "numpy": "1.22.0"
    },
    "System": {
        "OS": "Darwin",
        "architecture": [
            "64bit",
            ""
        ],
        "processor": "arm",
        "python": "3.10.14",
        "version": "Darwin Kernel Version 23.5.0: Wed May  1 20:16:51 PDT 2024; root:xnu-10063.121.3~5/RELEASE_ARM64_T8103"
    }
}

Additional context

No response

thomasf1 commented 1 month ago

Not sure if it´s a bug, but it sure as hell seems strange