coqui-ai / TTS

🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
http://coqui.ai
Mozilla Public License 2.0
35.07k stars 4.27k forks source link

[Bug] Get Error with Azeri characters #3577

Closed ulvi95 closed 4 months ago

ulvi95 commented 8 months ago

Describe the bug

Dear Community and Developers!

I get the error while I want to train with my dataset containing Azeri letters. Despite that I added Azeri letters in

character_config = CharactersConfig(
    characters_class= "TTS.tts.models.vits.VitsCharacters",
    characters= r"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz1234567890ÇƏĞİÖŞÜçəğıöşü",
    punctuations=" ,.:;?",
    pad= "<PAD>",
    eos= "<EOS>",
    bos= "<BOS>",
    blank= "<BLNK>",
)

and debugging shows that the chars are read from the file properly, I get following errors:

_ctimaiyy_t_ a__qlamas_n_ t_l_b edirl_r.
[!] Character _' not found in the vocabulary. Discarding it.
xarici __l_r nazirliyind_n bildirilib.
[!] Character _' not found in the vocabulary. Discarding it.
___i qruplar_n yarad_lmas_ n_z_rd_ tutulub.
 [!] Character _' not found in the vocabulary. Discarding it.

_ttihamlara sumqay_t h_r _cra hakimiyy_tinin memarlq v h_rsalma _dar_sind_n ayd_nl_q gtirilib. adkil_n qurumun m_h_ndisi k_nan babayev mtk n_n onlara laz_mi s_n_di t_qdim etmdiyini deyir. [!] Character ' not found in the vocabulary. Discarding it. g_ncd yeniyetm_ q_z_n altm iki yal qonusundan hamil qald iddia olunur. hadis h_rin m_kt_b k_si iyirmi __ _nvannda qeyd al_n_b. _lkin m_lumata gr, iki min be_inci il t_v_lldl _smay_lova g_nel _lham q_z_nn. [!] Character ' not found in the vocabulary. Discarding it. n_hay_t, bir g_nd_n sonra idar_nin m_dir mavinin, suallar_mz nvanlaya bildik. il_r problem _lind_n tng g_ldikl_rini des_lr d, idar_ r_hb_rliyi b_t_n naraz_l_qlar_n yoluna qoyulaca_n v_d eldi. [!] Character ' not found in the vocabulary. Discarding it. n_hay_t, bir g_nd_n sonra idar_nin m_dir mavinin, suallar_mz _nvanlaya bildik. _il_r problem _lind_n tng g_ldikl_rini des_lr d, idar_ r_hb_rliyi b_t_n naraz_l_qlar_n yoluna qoyulacan_ v_d eldi. [!] Character ' not found in the vocabulary. Discarding it. ! Run is removed from folder_for_models/vits_vctk-February-13-2024_01+27PM-0000000 Traceback (most recent call last): File "/opt/conda/lib/python3.10/site-packages/trainer/trainer.py", line 1833, in fit self._fit() File "/opt/conda/lib/python3.10/site-packages/trainer/trainer.py", line 1787, in _fit self.eval_epoch() File "/opt/conda/lib/python3.10/site-packages/trainer/trainer.py", line 1643, in eval_epoch for cur_step, batch in enumerate(self.eval_loader): File "/opt/conda/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 631, in next data = self._next_data() File "/opt/conda/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1346, in _next_data return self._process_data(data) File "/opt/conda/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1372, in _process_data data.reraise() File "/opt/conda/lib/python3.10/site-packages/torch/_utils.py", line 722, in reraise raise exception IndexError: Caught IndexError in DataLoader worker process 0. Original Traceback (most recent call last): File "/opt/conda/lib/python3.10/site-packages/torch/utils/data/_utils/worker.py", line 308, in _worker_loop data = fetcher.fetch(index) File "/opt/conda/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 51, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/opt/conda/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 51, in data = [self.dataset[idx] for idx in possibly_batched_index] File "/opt/conda/lib/python3.10/site-packages/TTS/tts/models/vits.py", line 280, in getitem return self.getitem(self.rescue_item_idx) File "/opt/conda/lib/python3.10/site-packages/TTS/tts/models/vits.py", line 280, in getitem return self.getitem(self.rescue_item_idx) File "/opt/conda/lib/python3.10/site-packages/TTS/tts/models/vits.py", line 280, in getitem return self.getitem(self.rescue_item_idx) [Previous line repeated 44 more times] File "/opt/conda/lib/python3.10/site-packages/TTS/tts/models/vits.py", line 263, in getitem item = self.samples[idx] IndexError: list index out of range

N. B. Original texts are

İctimaiyyətə açıqlamasını tələb edirlər. Xarici İşlər Nazirliyindən bildirilib. İşçi qrupların yaradılması nəzərdə tutulub. İttihamlara Sumqayıt Şəhər İcra Hakimiyyətinin Memarlıq və Şəhərsalma İdarəsindən aydınlıq gətirilib. Adıçəkilən qurumun mühəndisi Kənan Babayev Mtk nın onlara lazımi sənədi təqdim etmədiyini deyir. Gəncədə yeniyetmə qızın altmış iki yaşlı qonşusundan hamilə qaldığı iddia olunur. Hadisə şəhərin məktəb küçəsi iyirmi üç ünvanında qeydə alınıb. İlkin məlumata görə, iki min beşinci il təvəllüdlü İsmayılova Günel İlham qızının. Nəhayət, bir gündən sonra idarənin müdir müavininə, suallarımızı ünvanlaya bildik. İşçilər problem əlindən təngə gəldiklərini desələr də, idarə rəhbərliyi bütün narazılıqların yoluna qoyulacağını vəd elədi. Nəhayət, bir gündən sonra idarənin müdir müavininə, suallarımızı ünvanlaya bildik. İşçilər problem əlindən təngə gəldiklərini desələr də, idarə rəhbərliyi bütün narazılıqların yoluna qoyulacağını vəd elədi.

To Reproduce

import os

from trainer import Trainer, TrainerArgs

from TTS.tts.configs.shared_configs import BaseDatasetConfig, CharactersConfig from TTS.tts.configs.vits_config import VitsConfig from TTS.tts.datasets import load_tts_samples from TTS.tts.models.vits import Vits, VitsArgs, VitsAudioConfig, VitsCharacters from TTS.tts.utils.speakers import SpeakerManager from TTS.tts.utils.text.tokenizer import TTSTokenizer from TTS.utils.audio import AudioProcessor

def formatter(root_path, manifest_file, **kwargs): # pylint: disable=unused-argument """Assumes each line as <filename>|<transcription> """ txt_file = manifest_file items = [] speaker_name = "my_speaker"

with open(txt_file, "r", encoding="utf-8") as ttf:
    for line in ttf:
        cols = line.split("|")
        extension = ".wav"
        wav_file = f"{cols[0].split('.')[-2]+extension}"
        path_to_file = os.path.join(wav_file_path, wav_file)
        text = cols[1].strip()
        # print(text)
        #item = {"text":text, "audio_file":wav_file, "speaker_name":speaker_name, "root_path": root_path}
        #print(item)
        global raw_list
        raw_list.append({"text":text, "audio_file":path_to_file, "speaker_name":speaker_name, "root_path": root_path})
        items.append({"text":text, "audio_file":path_to_file, "speaker_name":speaker_name, "root_path": root_path})
return items

if name == "main": output_path = "folder_for_models" raw_list = [] EPOCHS = 500 BATCH_SIZE = 16 EVAL_BATCH_SIZE = 4 NUM_LOADER_WORKERS = 4 NUM_EVAL_LOADER_WORKERS = 4

root_path = ""
os.makedirs(output_path, exist_ok=True)

global wav_file_path
training_text_path = os.path.join(root_path, "f2_3h_train - Copy.tsv")
validation_text_path = os.path.join(root_path, "f2_3h_val - Copy.tsv")
wav_file_path = os.path.join(root_path, "f2_3h_22050")

dataset_config = BaseDatasetConfig(
    formatter="ljspeech", meta_file_train=training_text_path, meta_file_val=validation_text_path, path=wav_file_path
)

audio_config = VitsAudioConfig(
    sample_rate=22050, win_length=1024, hop_length=256, num_mels=80, mel_fmin=0, mel_fmax=None
)

character_config = CharactersConfig(
    characters_class= "TTS.tts.models.vits.VitsCharacters",
    characters= r"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz1234567890ÇƏĞİÖŞÜçəğıöşü",
    punctuations=" ,.:;?",
    pad= "<PAD>",
    eos= "<EOS>",
    bos= "<BOS>",
    blank= "<BLNK>",
)

config = VitsConfig(
    audio=audio_config,
    characters=character_config,
    run_name="vits_vctk",
    batch_size=BATCH_SIZE,
    eval_batch_size=EVAL_BATCH_SIZE,
    num_loader_workers=NUM_LOADER_WORKERS,
    num_eval_loader_workers=NUM_EVAL_LOADER_WORKERS,
    run_eval=True,
    test_delay_epochs=0,
    epochs=EPOCHS,
    text_cleaner="basic_cleaners",
    use_phonemes=False,
    phoneme_language=None,
    phoneme_cache_path=os.path.join(output_path, "phoneme_cache"),
    compute_input_seq_cache=True,
    print_step=25,
    print_eval=True,
    save_best_after=1,
    save_checkpoints=True,
    save_all_best=True,
    mixed_precision=True,
    max_text_len=250,  # change this if you have a larger VRAM than 16GB
    output_path=output_path,
    datasets=[dataset_config],
    cudnn_benchmark=False,
    test_sentences=[
        ["Bu, səs testidir"],
        ["Modeli öyrədərkən nəzərə alınmalı olan bəzi şeylər parametrlərdir."],
        ["Sabahınız xeyir"]
    ]
)

ap = AudioProcessor.init_from_config(config)

tokenizer, config = TTSTokenizer.init_from_config(config)

train_samples, eval_samples = load_tts_samples(
dataset_config, 
eval_split=True, 
formatter=formatter)

model = Vits(config, ap, tokenizer, speaker_manager=None)

trainer = Trainer(
    TrainerArgs(),
    config=config,
    output_path=output_path,
    model=model,
    train_samples=train_samples,
    eval_samples=eval_samples,
)

trainer.fit()

Expected behavior

No response

Logs

_ctimaiyy_t_ a__qlamas_n_ t_l_b edirl_r.
 [!] Character _' not found in the vocabulary. Discarding it.
xarici __l_r nazirliyind_n bildirilib.
 [!] Character _' not found in the vocabulary. Discarding it.
___i qruplar_n yarad_lmas_ n_z_rd_ tutulub.
 [!] Character _' not found in the vocabulary. Discarding it.

_ttihamlara sumqay_t __h_r _cra hakimiyy_tinin memarl_q v_ __h_rsalma _dar_sind_n ayd_nl_q g_tirilib. ad___kil_n qurumun m_h_ndisi k_nan babayev mtk n_n onlara laz_mi s_n_di t_qdim etm_diyini deyir.
 [!] Character _' not found in the vocabulary. Discarding it.
g_nc_d_ yeniyetm_ q_z_n altm__ iki ya_l_ qon_usundan hamil_ qald___ iddia olunur. hadis_ __h_rin m_kt_b k___si iyirmi __ _nvan_nda qeyd_ al_n_b. _lkin m_lumata g_r_, iki min be_inci il t_v_ll_dl_ _smay_lova g_nel _lham q_z_n_n.
 [!] Character _' not found in the vocabulary. Discarding it.
n_hay_t, bir g_nd_n sonra idar_nin m_dir m_avinin_, suallar_m_z_ _nvanlaya bildik. ___il_r problem _lind_n t_ng_ g_ldikl_rini des_l_r d_, idar_ r_hb_rliyi b_t_n naraz_l_qlar_n yoluna qoyulaca__n_ v_d el_di.
 [!] Character _' not found in the vocabulary. Discarding it.
n_hay_t, bir g_nd_n sonra idar_nin m_dir m_avinin_, suallar_m_z_ _nvanlaya bildik. ___il_r problem _lind_n t_ng_ g_ldikl_rini des_l_r d_, idar_ r_hb_rliyi b_t_n naraz_l_qlar_n yoluna qoyulaca__n_ v_d el_di.
 [!] Character _' not found in the vocabulary. Discarding it.
 ! Run is removed from folder_for_models/vits_vctk-February-13-2024_01+27PM-0000000
Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/site-packages/trainer/trainer.py", line 1833, in fit
    self._fit()
  File "/opt/conda/lib/python3.10/site-packages/trainer/trainer.py", line 1787, in _fit
    self.eval_epoch()
  File "/opt/conda/lib/python3.10/site-packages/trainer/trainer.py", line 1643, in eval_epoch
    for cur_step, batch in enumerate(self.eval_loader):
  File "/opt/conda/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 631, in __next__
    data = self._next_data()
  File "/opt/conda/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1346, in _next_data
    return self._process_data(data)
  File "/opt/conda/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1372, in _process_data
    data.reraise()
  File "/opt/conda/lib/python3.10/site-packages/torch/_utils.py", line 722, in reraise
    raise exception
IndexError: Caught IndexError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/site-packages/torch/utils/data/_utils/worker.py", line 308, in _worker_loop
    data = fetcher.fetch(index)
  File "/opt/conda/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 51, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/opt/conda/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 51, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/opt/conda/lib/python3.10/site-packages/TTS/tts/models/vits.py", line 280, in __getitem__
    return self.__getitem__(self.rescue_item_idx)
  File "/opt/conda/lib/python3.10/site-packages/TTS/tts/models/vits.py", line 280, in __getitem__
    return self.__getitem__(self.rescue_item_idx)
  File "/opt/conda/lib/python3.10/site-packages/TTS/tts/models/vits.py", line 280, in __getitem__
    return self.__getitem__(self.rescue_item_idx)
  [Previous line repeated 44 more times]
  File "/opt/conda/lib/python3.10/site-packages/TTS/tts/models/vits.py", line 263, in __getitem__
    item = self.samples[idx]
IndexError: list index out of range

Environment

{
    "CUDA": {
        "GPU": [],
        "available": false,
        "version": null
    },
    "Packages": {
        "PyTorch_debug": false,
        "PyTorch_version": "2.2.0+cpu",
        "TTS": "0.22.0",
        "numpy": "1.22.0"
    },
    "System": {
        "OS": "Windows",
        "architecture": [
            "64bit",
            "WindowsPE"
        ],
        "processor": "Intel64 Family 6 Model 158 Stepping 10, GenuineIntel",
        "python": "3.10.11",
        "version": "10.0.19044"
    }
}

Additional context

No response

ulvi95 commented 8 months ago

Update: Haven't tried the training yet, but increasing max_text_len=300 prevented audio discarding. The next stage is the training in a Linux server.

Update2: The text discarding and index errors still persist.

Update3: Somehow the lists and dictionaries (_vocab, _char_to_id, _id_to_char) remain with default characters in characters.py.

stale[bot] commented 6 months ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. You might also look our discussion channels.