coqui-ai / TTS

🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
http://coqui.ai
Mozilla Public License 2.0
32.37k stars 3.9k forks source link

Can't read Bengali year ১৯৫৪ সাল। কালো রাত। [Bug] #3815

Open khandakershahi opened 3 weeks ago

khandakershahi commented 3 weeks ago

Describe the bug

I was testing the Bengali Voice model and it missed the Bengali number pronunciation. Bengali numbers ০ ১ ২ ৩ ৪ ৫ ৬ ৭ ৮ ৯ 0 1 2 3 4 5 6 7 8 9.

১৯৫৪ সাল। কালো রাত। Here is supposed to read in Bengali, the year 1954.

log:

['১৯৫৪ সাল। কালো রাত।']
১৯৫৪ সাল। কালো রাত।
 [!] Character '৯' not found in the vocabulary. Discarding it.
 > Processing time: 1.444657564163208
 > Real-time factor: 0.46246659828395376

Log shows [!] Character '৯' not found in the vocabulary. Discarding it.

To Reproduce

pip install TTS

main.py

import torch
from TTS.api import TTS
import gradio as gr

device = "cuda" if torch.cuda.is_available() else "cpu"

def generate_audio(text="তুমি কেমন আছো?"):
    tts = TTS(model_name='tts_models/bn/custom/vits-male').to(device)
    tts.tts_to_file(text=text, file_path="outputs/output.wav")
    return "outputs/output.wav"

demo = gr.Interface(
    fn=generate_audio,
    inputs=[gr.Text(label="Text"),],
    outputs=[gr.Audio(label="Audio"),],
    )

demo.launch()

Expected behavior

No response

Logs

No response

Environment

{
    "CUDA": {
        "GPU": [],
        "available": false,
        "version": "12.1"
    },
    "Packages": {
        "PyTorch_debug": false,
        "PyTorch_version": "2.3.1+cu121",
        "TTS": "0.22.0",
        "numpy": "1.26.4"
    },
    "System": {
        "OS": "Linux",
        "architecture": [
            "64bit",
            "ELF"
        ],
        "processor": "",
        "python": "3.11.2",
        "version": "#1 SMP PREEMPT_DYNAMIC Debian 6.1.69-1 (2023-12-30)"
    }
}

Additional context

No response

saifulislam79 commented 2 weeks ago

@khandakershahi use pybangla normalizer for your number normalization

https://pypi.org/project/pybangla/#description

eginhard commented 2 weeks ago

@khandakershahi A Bengali phonemizer/normalizer is also included directly in Coqui TTS, you can use it as follows:

from TTS.tts.utils.text.phonemizers import BN_Phonemizer
bn = BN_Phonemizer()
bn.phonemize("১৯৫৪ সাল। কালো রাত।")

(resulting in এক হাজার নয় শত চুয়ান্ন সাল।কালো রাত।।)

khandakershahi commented 2 weeks ago

@saifulislam79 Thank you. I am just a normal user. Don't know python coding or TTS.

I tried, but I didn't able to figure out how to use with my main.py code. Would it be possible to give me an update code of my main.py, so that it works with your package? Or anything else that allow to use the GUI interface.

@eginhard Thank you. I didn't able to figure out how to use with my main.py code. Would it be possible to give me an update code of my main.py, so that it works with the BN_Phonemizer? Or anything else that allow to use the GUI interface.

eginhard commented 2 weeks ago

@khandakershahi Try this:

import torch
from TTS.api import TTS
from TTS.tts.utils.text.phonemizers import BN_Phonemizer
import gradio as gr

device = "cuda" if torch.cuda.is_available() else "cpu"
bn = BN_Phonemizer()
tts = TTS(model_name='tts_models/bn/custom/vits-male').to(device)

def generate_audio(text="তুমি কেমন আছো?"):
    text = bn.phonemize(text)
    tts.tts_to_file(text=text, file_path="outputs/output.wav")
    return "outputs/output.wav"

demo = gr.Interface(
    fn=generate_audio,
    inputs=[gr.Text(label="Text"),],
    outputs=[gr.Audio(label="Audio"),],
    )

demo.launch()
khandakershahi commented 2 weeks ago

@eginhard Thank you. It now works. One thing is here. It only read it as number, not year. like "১৯৫৪ সাল" should be "উনিশশ চুয়ান্ন সাল" or "উনিশশত চুয়ান্ন সাল।" "এক হাজার নয় শত চুয়ান্ন সাল।" not used in bagali language.

Many many thanks.