coqui-ai / TTS

🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
http://coqui.ai
Mozilla Public License 2.0
33.56k stars 4.08k forks source link

[Bug] Portuguese TTS model on XTTS is pronouncing the "." (dot) character when it happens in a text #2952

Closed Subarasheese closed 10 months ago

Subarasheese commented 1 year ago

Describe the bug

Hello,

It seem a bit of a "oopsie" was made when handling the Portuguese dataset as now the PTBR pronounces the "." character as ponto every time we insert sentences like:

"Olá, sou seu novo clone de voz. Faça o possível para carregar um áudio de qualidade."

Here is the output: https://vocaroo.com/1404xnr0Vkmc

It was not supposed to say "ponto"...

It goes like:

"Olá, sou seu novo clone de voz ponto Faça o possível para carregar um áudio de qualidade ponto"

But it should not be like that.

To Reproduce

Set the client to portuguese (pt) then type anything including "." (dot)

Expected behavior

Not pronouncing dot. The purpose of "." is to indicate the end of a declarative sentence or to separate certain elements in written text.

Logs

None

Environment

git clone https://huggingface.co/spaces/coqui/xtts
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
python app.py

Additional context

No response

Subarasheese commented 1 year ago

@Edresson

Edresson commented 1 year ago

Hi @Subarasheese, thanks for reporting this bug. We plan to fix this issue soon. As work around I noticed that if you add a space between the word and the point it will fix the issue.

Subarasheese commented 1 year ago

Hi @Subarasheese, thanks for reporting this bug. We plan to fix this issue soon. As work around I noticed that if you add a space between the word and the point it will fix the issue.

Thank you. I have a question, out of curiosity: can the dataset used to train the Portuguese model be found online, or did Coqui use a private/internal dataset for Portuguese?

stale[bot] commented 11 months ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. You might also look our discussion channels.

Inc44 commented 10 months ago

A similar error exists in other languages, such as French, Russian and Japanese. The problem appears in model xtts_v1.1, coqui 0.19.0, python 3.11.5.

Subarasheese commented 10 months ago

@Edresson The workaround (space before dot) is not working on xtts v2... It is still saying "dot" (ponto) Previusly the workaround worked every time, if I recall correctly

erogol commented 10 months ago

We don't actually know why it happens. If anyone has any ideas, let us know

Dhrog commented 10 months ago

I experienced the same problem with xtts-v2 using the german language.

Subarasheese commented 10 months ago

We don't actually know why it happens. If anyone has any ideas, let us know

Are you guys sure there isn't an issue with the dataset? What were your sources?

brambox commented 10 months ago

I'm also getting 'ponto' when fine tunning.

Dhrog commented 10 months ago

I used the example code and read the text from a file. I installed Coqui TTS yesterday, so it is still overwhelming right now. The sound file is attached. At one point you can hear: "Punkt dot" It quite often happens that there are long gaps between sentences. Not sure if there is a connection to this issue?


# -*- coding: utf-8 -*-
import sys
from pathlib import Path
import torch
from TTS.api import TTS

f = open(sys.argv[1], 'rb').read()
f = f.decode('unicode_escape').encode('latin-1').decode('utf-8')
print (f)

file_output = sys.argv[2]

# Get device
device = "cuda" if torch.cuda.is_available() else "cpu"

# List available 🐸TTS models
#print(TTS().list_models())

# Init TTS
tts = TTS("tts_models/multilingual/multi-dataset/xtts_v2").to(device)

# Run TTS
# ❗ Since this model is multi-lingual voice cloning model, we must set the target speaker_wav and language
# Text to speech list of amplitude values as output
wav = tts.tts(text= f, speaker_wav="Data/RefClips/4.wav", language="de")
# Text to speech to a file
tts.tts_to_file(text=f, speaker_wav="Data/RefClips/4.wav", language="de", file_path=file_output)

umlaut.zip

Inc44 commented 10 months ago

Temporarily it is possible to fix this problem by replacing dots "." with exclamations "!"

Edresson commented 10 months ago

Temporarily it is possible to fix this problem by replacing dots "." with exclamations "!"

In general, the use of ".." instead of ".", also works for Portuguese language.

wonka929 commented 9 months ago

Italian has the same issue. Except for workarounds, did you find a stable fix?

".." method does not work. Neither "!".

Thanks

PS: with italian works replacing "." with "\n"

fcrescio commented 7 months ago

This bug is still present at least for italian. Another workaround is to replace . with ;

Fgabz commented 6 months ago

We have the same issue in french

danielmzak commented 5 months ago

In Czech (xtts_v2 model) try replacing "." with ";\n" - this will make the ends of sentences sound more natural.

lincoln157nascimento commented 4 months ago

Does anyone have a solution to the problem?.

abhisirka2001 commented 1 month ago

Solution : Replacing the full stops(.) in the text with "|" works for the portuguese language also it adds a pause after the sentence ends. Using space instead of full stop doesnt add a pause. However using a text with "|" instead of full stops won't work for longer text so use shorter text prompt less than 400 tokens with "|".