coqui-ai / TTS

🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
http://coqui.ai
Mozilla Public License 2.0
35.61k stars 4.36k forks source link

Can't read Bengali year ১৯৫৪ সাল। কালো রাত। [Bug] #3815

Open khandakershahi opened 4 months ago

khandakershahi commented 4 months ago

Describe the bug

I was testing the Bengali Voice model and it missed the Bengali number pronunciation. Bengali numbers ০ ১ ২ ৩ ৪ ৫ ৬ ৭ ৮ ৯ 0 1 2 3 4 5 6 7 8 9.

১৯৫৪ সাল। কালো রাত। Here is supposed to read in Bengali, the year 1954.

log:

['১৯৫৪ সাল। কালো রাত।']
১৯৫৪ সাল। কালো রাত।
 [!] Character '৯' not found in the vocabulary. Discarding it.
 > Processing time: 1.444657564163208
 > Real-time factor: 0.46246659828395376

Log shows [!] Character '৯' not found in the vocabulary. Discarding it.

To Reproduce

pip install TTS

main.py

import torch
from TTS.api import TTS
import gradio as gr

device = "cuda" if torch.cuda.is_available() else "cpu"

def generate_audio(text="তুমি কেমন আছো?"):
    tts = TTS(model_name='tts_models/bn/custom/vits-male').to(device)
    tts.tts_to_file(text=text, file_path="outputs/output.wav")
    return "outputs/output.wav"

demo = gr.Interface(
    fn=generate_audio,
    inputs=[gr.Text(label="Text"),],
    outputs=[gr.Audio(label="Audio"),],
    )

demo.launch()

Expected behavior

No response

Logs

No response

Environment

{
    "CUDA": {
        "GPU": [],
        "available": false,
        "version": "12.1"
    },
    "Packages": {
        "PyTorch_debug": false,
        "PyTorch_version": "2.3.1+cu121",
        "TTS": "0.22.0",
        "numpy": "1.26.4"
    },
    "System": {
        "OS": "Linux",
        "architecture": [
            "64bit",
            "ELF"
        ],
        "processor": "",
        "python": "3.11.2",
        "version": "#1 SMP PREEMPT_DYNAMIC Debian 6.1.69-1 (2023-12-30)"
    }
}

Additional context

No response

saifulislam79 commented 4 months ago

@khandakershahi use pybangla normalizer for your number normalization

https://pypi.org/project/pybangla/#description

eginhard commented 4 months ago

@khandakershahi A Bengali phonemizer/normalizer is also included directly in Coqui TTS, you can use it as follows:

from TTS.tts.utils.text.phonemizers import BN_Phonemizer
bn = BN_Phonemizer()
bn.phonemize("১৯৫৪ সাল। কালো রাত।")

(resulting in এক হাজার নয় শত চুয়ান্ন সাল।কালো রাত।।)

khandakershahi commented 4 months ago

@saifulislam79 Thank you. I am just a normal user. Don't know python coding or TTS.

I tried, but I didn't able to figure out how to use with my main.py code. Would it be possible to give me an update code of my main.py, so that it works with your package? Or anything else that allow to use the GUI interface.

@eginhard Thank you. I didn't able to figure out how to use with my main.py code. Would it be possible to give me an update code of my main.py, so that it works with the BN_Phonemizer? Or anything else that allow to use the GUI interface.

eginhard commented 4 months ago

@khandakershahi Try this:

import torch
from TTS.api import TTS
from TTS.tts.utils.text.phonemizers import BN_Phonemizer
import gradio as gr

device = "cuda" if torch.cuda.is_available() else "cpu"
bn = BN_Phonemizer()
tts = TTS(model_name='tts_models/bn/custom/vits-male').to(device)

def generate_audio(text="তুমি কেমন আছো?"):
    text = bn.phonemize(text)
    tts.tts_to_file(text=text, file_path="outputs/output.wav")
    return "outputs/output.wav"

demo = gr.Interface(
    fn=generate_audio,
    inputs=[gr.Text(label="Text"),],
    outputs=[gr.Audio(label="Audio"),],
    )

demo.launch()
khandakershahi commented 4 months ago

@eginhard Thank you. It now works. One thing is here. It only read it as number, not year. like "১৯৫৪ সাল" should be "উনিশশ চুয়ান্ন সাল" or "উনিশশত চুয়ান্ন সাল।" "এক হাজার নয় শত চুয়ান্ন সাল।" not used in bagali language.

Many many thanks.

stale[bot] commented 1 week ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. You might also look our discussion channels.