coqui-ai / TTS

🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
http://coqui.ai
Mozilla Public License 2.0
32k stars 3.84k forks source link

[Bug] The voice-cloned speaker continues with garbage after to-be-spoken text was finished or mid-sentence #3572

Open Bardo-Konrad opened 5 months ago

Bardo-Konrad commented 5 months ago

Describe the bug

Sometimes the speech pauses then the speaker continues but it's neither written nor is it any language, but it's clearly the same speaker. Unless you want to create a horror movie with a disturbingly familiar voice, this behaviour is undesired. I think bark has the same issue.

To Reproduce

device = "cuda" if torch.cuda.is_available() else "cpu"
was = 'tts_models/multilingual/multi-dataset/xtts_v2'
tts = TTS(model_name=was).to(device)
tts.tts_to_file(text="Some longer text", speaker_wav="some.wav", language="de", file_path="some-output.wav")

Expected behavior

Only speak what's being written.

kaveenkumar commented 4 months ago

Anyone has a workaround to this?

I tried finishing all my text with a period "." but that does not stop the synthesizer from ending. Often there are artifacts along with the input text.

Bardo-Konrad commented 4 months ago

Anyone has a workaround to this?

I tried finishing all my text with a period "." but that does not stop the synthesizer from ending. Often there are artifacts along with the input text.

Probably the only way around it is to generate speech, use speech to text, compare to input get timestamps of gibberish, remove, resave.

Kinda dumb, but what the heck.

stale[bot] commented 2 months ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. You might also look our discussion channels.

Bardo-Konrad commented 2 weeks ago

I want to draw attention to this.