Closed ulvi95 closed 4 months ago
Update: Haven't tried the training yet, but increasing max_text_len=300 prevented audio discarding. The next stage is the training in a Linux server.
Update2: The text discarding and index errors still persist.
Update3: Somehow the lists and dictionaries (_vocab, _char_to_id, _id_to_char) remain with default characters in characters.py.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. You might also look our discussion channels.
Describe the bug
Dear Community and Developers!
I get the error while I want to train with my dataset containing Azeri letters. Despite that I added Azeri letters in
and debugging shows that the chars are read from the file properly, I get following errors:
_ttihamlara sumqay_t h_r _cra hakimiyy_tinin memarlq v h_rsalma _dar_sind_n ayd_nl_q gtirilib. adkil_n qurumun m_h_ndisi k_nan babayev mtk n_n onlara laz_mi s_n_di t_qdim etmdiyini deyir. [!] Character ' not found in the vocabulary. Discarding it. g_ncd yeniyetm_ q_z_n altm iki yal qonusundan hamil qald iddia olunur. hadis h_rin m_kt_b k_si iyirmi __ _nvannda qeyd al_n_b. _lkin m_lumata gr, iki min be_inci il t_v_lldl _smay_lova g_nel _lham q_z_nn. [!] Character ' not found in the vocabulary. Discarding it. n_hay_t, bir g_nd_n sonra idar_nin m_dir mavinin, suallar_mz nvanlaya bildik. il_r problem _lind_n tng g_ldikl_rini des_lr d, idar_ r_hb_rliyi b_t_n naraz_l_qlar_n yoluna qoyulaca_n v_d eldi. [!] Character ' not found in the vocabulary. Discarding it. n_hay_t, bir g_nd_n sonra idar_nin m_dir mavinin, suallar_mz _nvanlaya bildik. _il_r problem _lind_n tng g_ldikl_rini des_lr d, idar_ r_hb_rliyi b_t_n naraz_l_qlar_n yoluna qoyulacan_ v_d eldi. [!] Character ' not found in the vocabulary. Discarding it. ! Run is removed from folder_for_models/vits_vctk-February-13-2024_01+27PM-0000000 Traceback (most recent call last): File "/opt/conda/lib/python3.10/site-packages/trainer/trainer.py", line 1833, in fit self._fit() File "/opt/conda/lib/python3.10/site-packages/trainer/trainer.py", line 1787, in _fit self.eval_epoch() File "/opt/conda/lib/python3.10/site-packages/trainer/trainer.py", line 1643, in eval_epoch for cur_step, batch in enumerate(self.eval_loader): File "/opt/conda/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 631, in next data = self._next_data() File "/opt/conda/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1346, in _next_data return self._process_data(data) File "/opt/conda/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1372, in _process_data data.reraise() File "/opt/conda/lib/python3.10/site-packages/torch/_utils.py", line 722, in reraise raise exception IndexError: Caught IndexError in DataLoader worker process 0. Original Traceback (most recent call last): File "/opt/conda/lib/python3.10/site-packages/torch/utils/data/_utils/worker.py", line 308, in _worker_loop data = fetcher.fetch(index) File "/opt/conda/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 51, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/opt/conda/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 51, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/opt/conda/lib/python3.10/site-packages/TTS/tts/models/vits.py", line 280, in getitem
return self.getitem(self.rescue_item_idx)
File "/opt/conda/lib/python3.10/site-packages/TTS/tts/models/vits.py", line 280, in getitem
return self.getitem(self.rescue_item_idx)
File "/opt/conda/lib/python3.10/site-packages/TTS/tts/models/vits.py", line 280, in getitem
return self.getitem(self.rescue_item_idx)
[Previous line repeated 44 more times]
File "/opt/conda/lib/python3.10/site-packages/TTS/tts/models/vits.py", line 263, in getitem
item = self.samples[idx]
IndexError: list index out of range
N. B. Original texts are
İctimaiyyətə açıqlamasını tələb edirlər. Xarici İşlər Nazirliyindən bildirilib. İşçi qrupların yaradılması nəzərdə tutulub. İttihamlara Sumqayıt Şəhər İcra Hakimiyyətinin Memarlıq və Şəhərsalma İdarəsindən aydınlıq gətirilib. Adıçəkilən qurumun mühəndisi Kənan Babayev Mtk nın onlara lazımi sənədi təqdim etmədiyini deyir. Gəncədə yeniyetmə qızın altmış iki yaşlı qonşusundan hamilə qaldığı iddia olunur. Hadisə şəhərin məktəb küçəsi iyirmi üç ünvanında qeydə alınıb. İlkin məlumata görə, iki min beşinci il təvəllüdlü İsmayılova Günel İlham qızının. Nəhayət, bir gündən sonra idarənin müdir müavininə, suallarımızı ünvanlaya bildik. İşçilər problem əlindən təngə gəldiklərini desələr də, idarə rəhbərliyi bütün narazılıqların yoluna qoyulacağını vəd elədi. Nəhayət, bir gündən sonra idarənin müdir müavininə, suallarımızı ünvanlaya bildik. İşçilər problem əlindən təngə gəldiklərini desələr də, idarə rəhbərliyi bütün narazılıqların yoluna qoyulacağını vəd elədi.
To Reproduce
import os
from trainer import Trainer, TrainerArgs
from TTS.tts.configs.shared_configs import BaseDatasetConfig, CharactersConfig from TTS.tts.configs.vits_config import VitsConfig from TTS.tts.datasets import load_tts_samples from TTS.tts.models.vits import Vits, VitsArgs, VitsAudioConfig, VitsCharacters from TTS.tts.utils.speakers import SpeakerManager from TTS.tts.utils.text.tokenizer import TTSTokenizer from TTS.utils.audio import AudioProcessor
def formatter(root_path, manifest_file, **kwargs): # pylint: disable=unused-argument """Assumes each line as
<filename>|<transcription>
""" txt_file = manifest_file items = [] speaker_name = "my_speaker"if name == "main": output_path = "folder_for_models" raw_list = [] EPOCHS = 500 BATCH_SIZE = 16 EVAL_BATCH_SIZE = 4 NUM_LOADER_WORKERS = 4 NUM_EVAL_LOADER_WORKERS = 4
Expected behavior
No response
Logs
Environment
Additional context
No response