Bug: F5-TTS generates speech at an unnaturally high speed

FrnklyN commented 1 week ago

When using the F5-TTS engine in Audiobook Maker to convert text to speech, the generated audio plays at an unnaturally high speed, making it difficult to understand. This issue occurs regardless of the text length or settings used, including whether or not the “Use Duration Prediction Model” option is enabled.

Steps to Reproduce:

Open Audiobook Maker and set F5-TTS as the engine.
Input any text into the text field.
Toggle the “Use Duration Prediction Model” option (both enabled and disabled).
Generate speech using the default settings.
Play back the generated audio.

Expected Behavior: The speech should play back at a natural speed, comparable to other TTS engines.

Observed Behavior: The generated audio plays back at an excessively high speed in all cases, significantly affecting clarity and usability.

Environment: • Audiobook Maker version: v3.1 • Operating System: Windows 11 • GPU: NVIDIA RTX 4090 • CUDA version: 12.1

Additional Context: • I tested this issue both with and without the “Use Duration Prediction Model” option enabled, but the problem persists in both cases. • I am using the standard F5-TTS configuration as described in the installation guide. • The standalone F5-TTS repository works as expected without any speed issues.

If you need further information or testing, feel free to let me know!

pointave commented 5 days ago

I believe your WAV files are too long, as I encountered similar issues with 30-second tracks. I suggest reducing their duration to 14 seconds, as even at 15 seconds, additional text was being inserted. I figured this out when using the gradio F5TTS , adding a 30-second transcript would yield similar results in fast speech. So get back to work splitting and re-whispering and see if this fixes it for yourself.

JarodMica commented 3 days ago

This is a bug/enhancement feature, I'll be adding the f5tts speed setting soon. I had done it yesterday, but introduced some bugs so rolled it back. Also, a little bit of hat pointave said above ^

JarodMica commented 1 day ago

The speed control present in the main repo is now available from this commit onwards 651bc1cefd89491456e64cbe11cc4610abf6c571

Please let me know if you run into any issues!

JarodMica / audiobook_maker

Bug: F5-TTS generates speech at an unnaturally high speed #85