Closed str20tbl closed 1 year ago
I have the same problem. an attempt is made here to synthesize a None object. this occurs both with finetuning and when continuing!
edit: the problem arises when loading the test sentences. inside of vits.py @torch.no_grad() def test_run(self, assets) -> Tuple[Dict, Dict]: test_sentences = self.config.test_sentences instead of the required array from the strings,an array of arrays with individual letters of the respective sentence returns (like [['H', 'e', 'l', 'l', 'o']......])! they are already written incorrectly in the new config (inside of the trained-model-dir), but it doesn't seem to change anything if you just correct them in the config.json.
facing same issue,any update on this problem? how to fix this issue?
as a workaround you can go to the function I mentioned and make strings out of the character arrays again (that works for me). Otherwise we can only hope that the devs will read this as soon as possible or start fine tuning / continue themselves =)
@RobinE89 after solving this error as you mentioned,i faced another warning during training which is
WARNING:tensorboardX.x2num:NaN or Inf found in input tensor. WARNING:tensorboardX.x2num:NaN or Inf found in input tensor. WARNING:tensorboardX.x2num:NaN or Inf found in input tensor. WARNING:tensorboardX.x2num:NaN or Inf found in input tensor. WARNING:tensorboardX.x2num:NaN or Inf found in input tensor. WARNING:tensorboardX.x2num:NaN or Inf found in input tensor. WARNING:tensorboardX.x2num:NaN or Inf found in input tensor. WARNING:tensorboardX.x2num:NaN or Inf found in input tensor. WARNING:tensorboardX.x2num:NaN or Inf found in input tensor. WARNING:tensorboardX.x2num:NaN or Inf found in input tensor. WARNING:tensorboardX.x2num:NaN or Inf found in input tensor. WARNING:tensorboardX.x2num:NaN or Inf found in input tensor.
followed by NaN loss
--> STEP: 29/104 -- GLOBAL_STEP: 217050 | > loss_disc: nan (nan) | > loss_disc_real_0: nan (nan) | > loss_disc_real_1: nan (nan) | > loss_disc_real_2: nan (nan) | > loss_disc_real_3: nan (nan) | > loss_disc_real_4: nan (nan) | > loss_disc_real_5: nan (nan) | > loss_0: nan (nan) | > grad_norm_0: 0.00000 (0.00000) | > loss_gen: nan (nan) | > loss_kl: nan (nan) | > loss_feat: nan (nan) | > loss_mel: 15.34929 (16.04639) | > loss_duration: nan (nan) | > amp_scaler: 0.00000 (0.00000) | > loss_1: nan (nan) | > grad_norm_1: 0.00000 (0.00000) | > current_lr_0: 0.00015 | > current_lr_1: 0.00015 | > step_time: 1.05040 (0.99310) | > loader_time: 0.01950 (0.02812)
note that my dataset contains no NaN, i used same setting to train from scratch and didn't face such issue but after finetuning/continue training i am facing NaN issue. another sad story is best_model.pth got overwritten with nan loss,when i try to retrain again,it starts from nan loss any help please? @erogol
I dont think you should +
strings. Try keeping the whole string.
@erogol @RobinE89 i was able to train vits but for some technical difficulties my training got stopped and now when i am trying to retrain the model from last saved best checkpoint i am now getting this error
> Restoring best loss from best_model_11648.pth ...
> Starting with loaded last best loss 16.256176
> EPOCH: 0/2000
--> /home/ansary/Shabab/vits_20_october
> TRAINING (2022-10-25 23:40:32)
> DataLoader initialization
| > Tokenizer:
| > add_blank: True
| > use_eos_bos: False
| > use_phonemes: False
| > Number of instances : 6126
| > Preprocessing samples
| > Max text length: 114
| > Min text length: 16
| > Avg text length: 64.72233104799217
|
| > Max audio length: 276757.0
| > Min audio length: 49474.0
| > Avg audio length: 129107.80362389814
| > Num. instances discarded samples: 0
| > Batch group size: 0.
['<BLNK>', 'ত', '<BLNK>', 'া', '<BLNK>', 'র', '<BLNK>', ' ', '<BLNK>', 'এ', '<BLNK>', 'ক', '<BLNK>', 'ট', '<BLNK>', 'া', '<BLNK>', ' ', '<BLNK>', 'ক', '<BLNK>', 'া', '<BLNK>', 'র', '<BLNK>', 'ণ', '<BLNK>', 'ও', '<BLNK>', ' ', '<BLNK>', 'ছ', '<BLNK>', 'ি', '<BLNK>', 'ল', '<BLNK>', '\n', '<BLNK>']['<BLNK>', 'এ', '<BLNK>', 'র', '<BLNK>', 'ি', '<BLNK>', 'ক', '<BLNK>', 'া', '<BLNK>', ' ', '<BLNK>', 'ক', '<BLNK>', 'ো', '<BLNK>', 'হ', '<BLNK>', 'ু', '<BLNK>', 'ট', '<BLNK>', ' ', '<BLNK>', 'য', '<BLNK>', 'ে', '<BLNK>', ' ', '<BLNK>', 'ত', '<BLNK>', 'ু', '<BLNK>', 'ম', '<BLNK>', 'ি', '<BLNK>', ',', '<BLNK>', ' ', '<BLNK>', 'স', '<BLNK>', 'ে', '<BLNK>', ' ', '<BLNK>', 'আ', '<BLNK>', 'ম', '<BLNK>', 'ি', '<BLNK>', ' ', '<BLNK>', 'ট', '<BLNK>', 'ে', '<BLNK>', 'র', '<BLNK>', ' ', '<BLNK>', 'প', '<BLNK>', 'া', '<BLNK>', 'ই', '<BLNK>', '\n', '<BLNK>']['<BLNK>', 'দ', '<BLNK>', 'ে', '<BLNK>', 'খ', '<BLNK>', 'ে', '<BLNK>', ',', '<BLNK>', ' ', '<BLNK>', 'ও', '<BLNK>', 'ঁ', '<BLNK>', 'র', '<BLNK>', ' ', '<BLNK>', 'ক', '<BLNK>', 'া', '<BLNK>', 'ছ', '<BLNK>', ' ', '<BLNK>', 'থ', '<BLNK>', 'ে', '<BLNK>', 'ক', '<BLNK>', 'ে', '<BLNK>', ' ', '<BLNK>', 'ফ', '<BLNK>', '্', '<BLNK>', 'র', '<BLNK>', 'া', '<BLNK>', 'ঞ', '<BLNK>', '্', '<BLNK>', 'চ', '<BLNK>', 'া', '<BLNK>', 'ই', '<BLNK>', 'জ', '<BLNK>', 'ি', '<BLNK>', ' ', '<BLNK>', 'ন', '<BLNK>', 'ি', '<BLNK>', 'য়', '<BLNK>', 'ে', '<BLNK>', ' ', '<BLNK>', 'ক', '<BLNK>', 'া', '<BLNK>', 'র', '<BLNK>', 'ব', '<BLNK>', 'া', '<BLNK>', 'র', '<BLNK>', ' ', '<BLNK>', 'শ', '<BLNK>', 'ু', '<BLNK>', 'র', '<BLNK>', 'ু', '<BLNK>', ' ', '<BLNK>', 'ক', '<BLNK>', 'র', '<BLNK>', 'ি', '<BLNK>', '\n', '<BLNK>']
[!] Character '\n' not found in the vocabulary. Discarding it. [!] Character '\n' not found in the vocabulary. Discarding it. [!] Character '\n' not found in the vocabulary. Discarding it.
['<BLNK>', 'ক', '<BLNK>', 'ি', '<BLNK>', 'ন', '<BLNK>', '্', '<BLNK>', 'ত', '<BLNK>', '্', '<BLNK>', 'ত', '<BLNK>', ' ', '<BLNK>', 'ক', '<BLNK>', '্', '<BLNK>', 'ষ', '<BLNK>', 'ম', '<BLNK>', 'ত', '<BLNK>', 'া', '<BLNK>', 'য়', '<BLNK>', ',', '<BLNK>', 'আ', '<BLNK>', 'স', '<BLNK>', 'া', '<BLNK>', 'র', '<BLNK>', ' ', '<BLNK>', 'প', '<BLNK>', 'র', '<BLNK>', 'ে', '<BLNK>', ',', '<BLNK>', 'ত', '<BLNK>', 'া', '<BLNK>', 'ঁ', '<BLNK>', 'ক', '<BLNK>', 'ে', '<BLNK>', ' ', '<BLNK>', 'উ', '<BLNK>', 'দ', '<BLNK>', '্', '<BLNK>', 'ভ', '<BLNK>', '্', '<BLNK>', 'র', '<BLNK>', 'া', '<BLNK>', 'ন', '<BLNK>', '্', '<BLNK>', 'ত', '<BLNK>', ' ', '<BLNK>', 'ম', '<BLNK>', 'ন', '<BLNK>', 'ে', '<BLNK>', ' ', '<BLNK>', 'হ', '<BLNK>', 'চ', '<BLNK>', '্', '<BLNK>', 'ছ', '<BLNK>', 'ে', '<BLNK>', '\n', '<BLNK>']
[!] Character '\n' not found in the vocabulary. Discarding it.
/home/ansary/anaconda3/envs/mobassir/lib/python3.8/site-packages/torch/functional.py:606: UserWarning: stft will soon require the return_complex parameter be given for real inputs, and will further require that return_complex=True in a future PyTorch release. (Triggered internally at /opt/conda/conda-bld/pytorch_1659484810403/work/aten/src/ATen/native/SpectralOps.cpp:800.)
return _VF.stft(input, n_fft, hop_length, win_length, window, # type: ignore[attr-defined]
--> STEP: 49/64 -- GLOBAL_STEP: 288050
| > loss_disc: 2.38776 (2.39514)
| > loss_disc_real_0: 0.12978 (0.13257)
| > loss_disc_real_1: 0.19328 (0.19674)
| > loss_disc_real_2: 0.23138 (0.22002)
| > loss_disc_real_3: 0.20934 (0.22139)
| > loss_disc_real_4: 0.21563 (0.21840)
| > loss_disc_real_5: 0.22416 (0.22545)
| > loss_0: 2.38776 (2.39514)
| > grad_norm_0: 18.32785 (16.97663)
| > loss_gen: 2.49931 (2.45184)
| > loss_kl: 1.21337 (1.24123)
| > loss_feat: 8.46184 (8.30167)
| > loss_mel: 15.62020 (15.57008)
| > loss_duration: 1.48671 (1.45953)
| > amp_scaler: 512.00000 (1107.59184)
| > loss_1: 29.28143 (29.02435)
| > grad_norm_1: 349.82025 (237.90034)
| > current_lr_0: 0.00011
| > current_lr_1: 0.00011
| > step_time: 1.67390 (2.26524)
| > loader_time: 0.02980 (0.02500)
> EVALUATION
> DataLoader initialization
| > Tokenizer:
| > add_blank: True
| > use_eos_bos: False
| > use_phonemes: False
| > Number of instances : 61
| > Preprocessing samples
| > Max text length: 92
| > Min text length: 40
| > Avg text length: 64.52459016393442
|
| > Max audio length: 171374.0
| > Min audio length: 68524.0
| > Avg audio length: 126232.5081967213
| > Num. instances discarded samples: 0
| > Batch group size: 0.
['<BLNK>', 'ত', '<BLNK>', 'া', '<BLNK>', 'ঁ', '<BLNK>', 'র', '<BLNK>', ' ', '<BLNK>', 'জ', '<BLNK>', 'ন', '<BLNK>', '্', '<BLNK>', 'য', '<BLNK>', ' ', '<BLNK>', 'ম', '<BLNK>', 'ে', '<BLNK>', 'ক', '<BLNK>', ' ', '<BLNK>', 'আ', '<BLNK>', 'প', '<BLNK>', 'ে', '<BLNK>', 'র', '<BLNK>', ' ', '<BLNK>', 'ল', '<BLNK>', 'ো', '<BLNK>', 'ক', '<BLNK>', ' ', '<BLNK>', 'এ', '<BLNK>', 'স', '<BLNK>', 'ে', '<BLNK>', 'ছ', '<BLNK>', 'ে', '<BLNK>', 'ন', '<BLNK>', ',', '<BLNK>', 'ই', '<BLNK>', 'ং', '<BLNK>', 'ল', '<BLNK>', 'ণ', '<BLNK>', '্', '<BLNK>', 'ড', '<BLNK>', ' ', '<BLNK>', 'থ', '<BLNK>', 'ে', '<BLNK>', 'ক', '<BLNK>', 'ে', '<BLNK>', '\n', '<BLNK>']['<BLNK>', 'স', '<BLNK>', '্', '<BLNK>', 'ক', '<BLNK>', 'ু', '<BLNK>', 'ল', '<BLNK>', 'ে', '<BLNK>', 'র', '<BLNK>', ' ', '<BLNK>', 'প', '<BLNK>', '্', '<BLNK>', 'র', '<BLNK>', 'ধ', '<BLNK>', 'া', '<BLNK>', 'ন', '<BLNK>', ' ', '<BLNK>', 'শ', '<BLNK>', 'ি', '<BLNK>', 'ক', '<BLNK>', '্', '<BLNK>', 'ষ', '<BLNK>', 'ক', '<BLNK>', ',', '<BLNK>', ' ', '<BLNK>', 'প', '<BLNK>', '্', '<BLNK>', 'র', '<BLNK>', 'ণ', '<BLNK>', 'য়', '<BLNK>', 'চ', '<BLNK>', 'ন', '<BLNK>', '্', '<BLNK>', 'দ', '<BLNK>', '্', '<BLNK>', 'র', '<BLNK>', ' ', '<BLNK>', 'ভ', '<BLNK>', 'ট', '<BLNK>', '্', '<BLNK>', 'ট', '<BLNK>', 'া', '<BLNK>', 'চ', '<BLNK>', 'া', '<BLNK>', 'র', '<BLNK>', '্', '<BLNK>', 'য', '<BLNK>', ',', '<BLNK>', ' ', '<BLNK>', 'ম', '<BLNK>', 'া', '<BLNK>', 'র', '<BLNK>', 'ধ', '<BLNK>', 'র', '<BLNK>', 'ে', '<BLNK>', 'র', '<BLNK>', ' ', '<BLNK>', 'অ', '<BLNK>', 'ভ', '<BLNK>', 'ি', '<BLNK>', 'য', '<BLNK>', 'ো', '<BLNK>', 'গ', '<BLNK>', ' ', '<BLNK>', 'ম', '<BLNK>', 'ে', '<BLNK>', 'ন', '<BLNK>', 'ে', '<BLNK>', ' ', '<BLNK>', 'ন', '<BLNK>', 'ি', '<BLNK>', 'য়', '<BLNK>', 'ে', '<BLNK>', 'ছ', '<BLNK>', 'ে', '<BLNK>', 'ন', '<BLNK>', '\n', '<BLNK>']
[!] Character '\n' not found in the vocabulary. Discarding it.
[!] Character '\n' not found in the vocabulary. Discarding it.
! Run is kept in /home/ansary/Shabab/vits_20_october
| > Synthesizing test sentences.
Traceback (most recent call last):
File "/home/ansary/anaconda3/envs/mobassir/lib/python3.8/site-packages/trainer/trainer.py", line 1533, in fit
self._fit()
File "/home/ansary/anaconda3/envs/mobassir/lib/python3.8/site-packages/trainer/trainer.py", line 1521, in _fit
self.test_run()
File "/home/ansary/anaconda3/envs/mobassir/lib/python3.8/site-packages/trainer/trainer.py", line 1439, in test_run
test_outputs = self.model.test_run(self.training_assets)
File "/home/ansary/anaconda3/envs/mobassir/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/home/ansary/anaconda3/envs/mobassir/lib/python3.8/site-packages/TTS/tts/models/vits.py", line 1435, in test_run
wav, alignment, _, _ = synthesis(
File "/home/ansary/anaconda3/envs/mobassir/lib/python3.8/site-packages/TTS/tts/utils/synthesis.py", line 180, in synthesis
model.tokenizer.text_to_ids(text, language=language_id),
File "/home/ansary/anaconda3/envs/mobassir/lib/python3.8/site-packages/TTS/tts/utils/text/tokenizer.py", line 111, in text_to_ids
text = self.intersperse_blank_char(text, True)
File "/home/ansary/anaconda3/envs/mobassir/lib/python3.8/site-packages/TTS/tts/utils/text/tokenizer.py", line 130, in intersperse_blank_char
result = [char_to_use] * (len(char_sequence) * 2 + 1)
TypeError: object of type 'NoneType' has no len()
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
File ~/anaconda3/envs/mobassir/lib/python3.8/site-packages/trainer/trainer.py:1533, in Trainer.fit(self)
1532 try:
-> 1533 self._fit()
1534 if self.args.rank == 0:
File ~/anaconda3/envs/mobassir/lib/python3.8/site-packages/trainer/trainer.py:1521, in Trainer._fit(self)
1520 if epoch >= self.config.test_delay_epochs and self.args.rank <= 0:
-> 1521 self.test_run()
1522 self.c_logger.print_epoch_end(
1523 epoch,
1524 self.keep_avg_eval.avg_values if self.config.run_eval else self.keep_avg_train.avg_values,
1525 )
File ~/anaconda3/envs/mobassir/lib/python3.8/site-packages/trainer/trainer.py:1439, in Trainer.test_run(self)
1438 else:
-> 1439 test_outputs = self.model.test_run(self.training_assets)
1440 elif hasattr(self.model, "test") or (self.num_gpus > 1 and hasattr(self.model.module, "test")):
File ~/anaconda3/envs/mobassir/lib/python3.8/site-packages/torch/autograd/grad_mode.py:27, in _DecoratorContextManager.__call__.<locals>.decorate_context(*args, **kwargs)
26 with self.clone():
---> 27 return func(*args, **kwargs)
File ~/anaconda3/envs/mobassir/lib/python3.8/site-packages/TTS/tts/models/vits.py:1435, in Vits.test_run(self, assets)
1434 aux_inputs = self.get_aux_input_from_test_sentences(s_info)
-> 1435 wav, alignment, _, _ = synthesis(
1436 self,
1437 aux_inputs["text"],
1438 self.config,
1439 "cuda" in str(next(self.parameters()).device),
1440 speaker_id=aux_inputs["speaker_id"],
1441 d_vector=aux_inputs["d_vector"],
1442 style_wav=aux_inputs["style_wav"],
1443 language_id=aux_inputs["language_id"],
1444 use_griffin_lim=True,
1445 do_trim_silence=False,
1446 ).values()
1447 test_audios["{}-audio".format(idx)] = wav
File ~/anaconda3/envs/mobassir/lib/python3.8/site-packages/TTS/tts/utils/synthesis.py:180, in synthesis(model, text, CONFIG, use_cuda, speaker_id, style_wav, style_text, use_griffin_lim, do_trim_silence, d_vector, language_id)
178 # convert text to sequence of token IDs
179 text_inputs = np.asarray(
--> 180 model.tokenizer.text_to_ids(text, language=language_id),
181 dtype=np.int32,
182 )
183 # pass tensors to backend
File ~/anaconda3/envs/mobassir/lib/python3.8/site-packages/TTS/tts/utils/text/tokenizer.py:111, in TTSTokenizer.text_to_ids(self, text, language)
110 if self.add_blank:
--> 111 text = self.intersperse_blank_char(text, True)
112 if self.use_eos_bos:
File ~/anaconda3/envs/mobassir/lib/python3.8/site-packages/TTS/tts/utils/text/tokenizer.py:130, in TTSTokenizer.intersperse_blank_char(self, char_sequence, use_blank_char)
129 char_to_use = self.characters.blank if use_blank_char else self.characters.pad
--> 130 result = [char_to_use] * (len(char_sequence) * 2 + 1)
131 result[1::2] = char_sequence
TypeError: object of type 'NoneType' has no len()
During handling of the above exception, another exception occurred:
SystemExit Traceback (most recent call last)
File <timed eval>:1
File ~/anaconda3/envs/mobassir/lib/python3.8/site-packages/trainer/trainer.py:1554, in Trainer.fit(self)
1552 remove_experiment_folder(self.output_path)
1553 traceback.print_exc()
-> 1554 sys.exit(1)
SystemExit: 1
any help please?
It seams the devs not using --continue_path with the vits model because its not working, always an error:
text = re.sub(_comma_number_re, _remove_commas, text)
File "/opt/conda/lib/python3.7/re.py", line 194, in sub
return _compile(pattern, flags).sub(repl, string, count)
TypeError: expected string or bytes-like object
I can confirm that "TypeError: expected string or bytes-like object" occurs when resuming training via --continue_path while finetuning a VITS model. Any solution?
@RobinE89 after solving this error as you mentioned,i faced another warning during training which is
WARNING:tensorboardX.x2num:NaN or Inf found in input tensor. WARNING:tensorboardX.x2num:NaN or Inf found in input tensor. WARNING:tensorboardX.x2num:NaN or Inf found in input tensor. WARNING:tensorboardX.x2num:NaN or Inf found in input tensor. WARNING:tensorboardX.x2num:NaN or Inf found in input tensor. WARNING:tensorboardX.x2num:NaN or Inf found in input tensor. WARNING:tensorboardX.x2num:NaN or Inf found in input tensor. WARNING:tensorboardX.x2num:NaN or Inf found in input tensor. WARNING:tensorboardX.x2num:NaN or Inf found in input tensor. WARNING:tensorboardX.x2num:NaN or Inf found in input tensor. WARNING:tensorboardX.x2num:NaN or Inf found in input tensor. WARNING:tensorboardX.x2num:NaN or Inf found in input tensor.
followed by NaN loss
--> STEP: 29/104 -- GLOBAL_STEP: 217050 | > loss_disc: nan (nan) | > loss_disc_real_0: nan (nan) | > loss_disc_real_1: nan (nan) | > loss_disc_real_2: nan (nan) | > loss_disc_real_3: nan (nan) | > loss_disc_real_4: nan (nan) | > loss_disc_real_5: nan (nan) | > loss_0: nan (nan) | > grad_norm_0: 0.00000 (0.00000) | > loss_gen: nan (nan) | > loss_kl: nan (nan) | > loss_feat: nan (nan) | > loss_mel: 15.34929 (16.04639) | > loss_duration: nan (nan) | > amp_scaler: 0.00000 (0.00000) | > loss_1: nan (nan) | > grad_norm_1: 0.00000 (0.00000) | > current_lr_0: 0.00015 | > current_lr_1: 0.00015 | > step_time: 1.05040 (0.99310) | > loader_time: 0.01950 (0.02812)
note that my dataset contains no NaN, i used same setting to train from scratch and didn't face such issue but after finetuning/continue training i am facing NaN issue. another sad story is best_model.pth got overwritten with nan loss,when i try to retrain again,it starts from nan loss any help please? @erogol
Has this problem been solved?
I added the line test_sentence_file to train_vits.py that points to a text file I have text sentences in. It works, I saw that it copied an empty test_sentence_file into the config.json so I struggled but managed
import os
from trainer import Trainer, TrainerArgs
from TTS.tts.configs.shared_configs import BaseDatasetConfig from TTS.tts.configs.vits_config import VitsConfig from TTS.tts.datasets import load_tts_samples from TTS.tts.models.vits import Vits, VitsAudioConfig from TTS.tts.utils.text.tokenizer import TTSTokenizer from TTS.utils.audio import AudioProcessor
def main(): output_path = os.path.dirname(os.path.abspath(file)) dataset_config = BaseDatasetConfig( formatter="ljspeech", meta_file_train="metadata.csv", path=os.path.join(output_path, "./davis/") ) audio_config = VitsAudioConfig( sample_rate=22050, win_length=1024, hop_length=256, num_mels=80, mel_fmin=0, mel_fmax=None )
config = VitsConfig(
audio=audio_config,
run_name="vits_ljspeech",
batch_size=2,
eval_batch_size=1,
batch_group_size=5,
num_loader_workers=8,
num_eval_loader_workers=4,
run_eval=True,
test_delay_epochs=-1,
epochs=1000,
text_cleaner="phoneme_cleaners",
use_phonemes=True,
phoneme_language="en-us",
phoneme_cache_path=os.path.join(output_path, "phoneme_cache"),
compute_input_seq_cache=True,
print_step=10,
print_eval=True,
mixed_precision=True,
test_sentences_file="./test.txt",
output_path=output_path,
datasets=[dataset_config],
)
# INITIALIZE THE AUDIO PROCESSOR
# Audio processor is used for feature extraction and audio I/O.
# It mainly serves to the dataloader and the training loggers.
ap = AudioProcessor.init_from_config(config)
# INITIALIZE THE TOKENIZER
# Tokenizer is used to convert text to sequences of token IDs.
# config is updated with the default characters if not defined in the config.
tokenizer, config = TTSTokenizer.init_from_config(config)
# LOAD DATA SAMPLES
# Each sample is a list of ```[text, audio_file_path, speaker_name]```
# You can define your custom sample loader returning the list of samples.
# Or define your custom formatter and pass it to the `load_tts_samples`.
# Check `TTS.tts.datasets.load_tts_samples` for more details.
train_samples, eval_samples = load_tts_samples(
dataset_config,
eval_split=True,
eval_split_max_size=config.eval_split_max_size,
eval_split_size=config.eval_split_size,
)
# init model
model = Vits(config, ap, tokenizer, speaker_manager=None)
# init the trainer and 🚀
trainer = Trainer(
TrainerArgs(),
config,
output_path,
model=model,
train_samples=train_samples,
eval_samples=eval_samples,
)
trainer.fit()
from multiprocessing import Process, freeze_support if name == 'main': freeze_support() # needed for Windows main() `
`
File "E:\KL2.0\CODEZ\Coqui\tts-coqui\TTS\venv\lib\site-packages\trainer\trainer.py", line 1591, in fit
self._fit()
File "E:\KL2.0\CODEZ\Coqui\tts-coqui\TTS\venv\lib\site-packages\trainer\trainer.py", line 1548, in _fit
self.test_run()
File "E:\KL2.0\CODEZ\Coqui\tts-coqui\TTS\venv\lib\site-packages\trainer\trainer.py", line 1466, in test_run
test_outputs = self.model.test_run(self.training_assets)
File "E:\KL2.0\CODEZ\Coqui\tts-coqui\TTS\venv\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "E:\KL2.0\CODEZ\Coqui\tts-coqui\TTS\venv\lib\site-packages\TTS\tts\models\vits.py", line 1442, in test_run
wav, alignment, _, _ = synthesis(
File "E:\KL2.0\CODEZ\Coqui\tts-coqui\TTS\venv\lib\site-packages\TTS\tts\utils\synthesis.py", line 186, in synthesis
model.tokenizer.text_to_ids(text, language=language_name),
File "E:\KL2.0\CODEZ\Coqui\tts-coqui\TTS\venv\lib\site-packages\TTS\tts\utils\text\tokenizer.py", line 108, in text_to_ids
text = self.text_cleaner(text)
File "E:\KL2.0\CODEZ\Coqui\tts-coqui\TTS\venv\lib\site-packages\TTS\tts\utils\text\cleaners.py", line 125, in phoneme_cleaners
text = en_normalize_numbers(text)
File "E:\KL2.0\CODEZ\Coqui\tts-coqui\TTS\venv\lib\site-packages\TTS\tts\utils\text\english\number_norm.py", line 92, in normalize_numbers
text = re.sub(_comma_number_re, _remove_commas, text)
File "C:\Users\shada\AppData\Local\Programs\Python\Python38\lib\re.py", line 208, in sub
return _compile(pattern, flags).sub(repl, string, count)
TypeError: expected string or bytes-like object```
how to solve this error???
Describe the bug
When using TrainerArgs(continue_path="") to resuming training the test sentences fail to generate, giving an unexpected input error as listed below.
To be clear first run is from scratch, the second run I then add in the continue_path and it trains fine until it tries to generate the test sentences.
I was getting the same error when training from scratch if I use test_sentences = [[""],[""]] hence the ugly list of test sentences below.
To Reproduce
Expected behavior
To resume training
Logs
Environment
Additional context
No response