Using the steps from Tutorial_2_train_your_first_TTS_model.ipynb I've recorded a couple of .wav files (16 in total) and created the metadata.csv file. The training starts a run, appears to do some preliminary analysis and then asserts without a clear error message.
I'm at a loss. I can't find anything wrong with my (small) data set nor metadata.
Any pointers?
training.py looks like this:
formatter="ljspeech", meta_file_train="metadata.csv", path="data"
)
output_path = "/home/voice/output/"
if not os.path.exists(output_path):
os.makedirs(output_path)
# GlowTTSConfig: all model related values for training, validating and testing.
from TTS.tts.configs.glow_tts_config import GlowTTSConfig
config = GlowTTSConfig(
batch_size=32,
eval_batch_size=16,
eval_split_size=0.0625,
num_loader_workers=2,
num_eval_loader_workers=2,
run_eval=True,
test_delay_epochs=-1,
epochs=100,
text_cleaner="phoneme_cleaners",
use_phonemes=True,
phoneme_language="en-us",
phoneme_cache_path=os.path.join(output_path, "phoneme_cache"),
print_step=25,
print_eval=False,
mixed_precision=True,
output_path=output_path,
datasets=[dataset_config],
save_step=1000,
)```
Backtrace is as follows:
```/home/voice/.local/lib/python3.8/site-packages/librosa/core/spectrum.py:256: UserWarning: n_fft=1024 is too large for input signal of length=2
warnings.warn(
! Run is removed from /home/voice/output/run-May-04-2023_08+51PM-0000000
Traceback (most recent call last):
File "/home/voice/.local/lib/python3.8/site-packages/trainer/trainer.py", line 1591, in fit
self._fit()
File "/home/voice/.local/lib/python3.8/site-packages/trainer/trainer.py", line 1544, in _fit
self.train_epoch()
File "/home/voice/.local/lib/python3.8/site-packages/trainer/trainer.py", line 1308, in train_epoch
for cur_step, batch in enumerate(self.train_loader):
File "/home/voice/.local/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 634, in __next__
data = self._next_data()
File "/home/voice/.local/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1346, in _next_data
return self._process_data(data)
File "/home/voice/.local/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1372, in _process_data
data.reraise()
File "/home/voice/.local/lib/python3.8/site-packages/torch/_utils.py", line 644, in reraise
raise exception
AssertionError: Caught AssertionError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/home/voice/.local/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 308, in _worker_loop
data = fetcher.fetch(index)
File "/home/voice/.local/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 54, in fetch
return self.collate_fn(data)
File "/home/voice/.local/lib/python3.8/site-packages/TTS/tts/datasets/dataset.py", line 464, in collate_fn
mel = prepare_tensor(mel, self.outputs_per_step)
File "/home/voice/.local/lib/python3.8/site-packages/TTS/tts/utils/data.py", line 29, in prepare_tensor
return np.stack([_pad_tensor(x, pad_len) for x in inputs])
File "/home/voice/.local/lib/python3.8/site-packages/TTS/tts/utils/data.py", line 29, in <listcomp>
return np.stack([_pad_tensor(x, pad_len) for x in inputs])
File "/home/voice/.local/lib/python3.8/site-packages/TTS/tts/utils/data.py", line 20, in _pad_tensor
assert x.ndim == 2
AssertionError```
### To Reproduce
Record 16 .wavs
Use the config above
Go through the tutorial notebook
Use this metadata file:
```LJ001-0001|Click the red Star Record button above to start recording.|Click the red Star Record button above to start recording.
LJ001-0002|While recording, you can pause and resume recording by clicking the appropiate button.|While recording, you can pause and resume recording by clicking the appropiate button.
LJ001-0003|When you are finished recording, click the Stop Recording button.|When you are finished recording, click the Stop Recording button.
LJ001-0004|You can save recoridng sound to your computer, or you can choose to cut and edit sound.|You can save recoridng sound to your computer, or you can choose to cut and edit sound.
LJ001-0005|If you choose to edit sound, go to the editing page.|If you choose to edit sound, go to the editing page.
LJ001-0006|After the modification is completed, you can save to the computer.|After the modification is completed, you can save to the computer.
LJ001-0007|The saved format can be MP3, WAV, OGG etcetera.|The saved format can be MP3, WAV, OGG etcetera.
LJ001-0008|Features include start recording, pause recording, resume recording, stop recording and real-time display of recording time, waveform, data size and other information.|Features include start recording, pause recording, resume recording, stop recording and real-time display of recording time, waveform, data size and other information.
LJ001-0009|Based on the standard interface of bootstrap recording can be done in 3 easy steps.|Based on the standard interface of bootstrap recording can be done in 3 easy steps.
LJ001-0010|There are no complicated settings and options so click with the mouse to complete.|There are no complicated settings and options so click with the mouse to complete.
LJ001-0011|The combination of recording and editing integrates the functions of recording and editing.|The combination of recording and editing integrates the functions of recording and editing.
LJ001-0012|Your computer device needs a microphone and sound card.|Your computer device needs a microphone and sound card.
LJ001-0013|This program can be used under any operating system, including Windows, Mac, Linux, etcetera.|This program can be used under any operating system, including Windows, Mac, Linux, etcetera.
LJ001-0014|You need to allow your browser to use microphone device.|You need to allow your browser to use microphone device.
LJ001-0015|HTML is the latest technical standard for web browsers and it supports input, processing, and saving of audio directly in the browser.|HTML is the latest technical standard for web browsers and it supports input, processing, and saving of audio directly in the browser.
LJ001-0016|This program provides complete editing functions that include: cut, fade in, fade out, change volume, and many other things.|This program provides complete editing functions that include: cut, fade in, fade out, change volume, and many other things.```
Run training.py
### Expected behavior
Completion, no assertion
### Logs
```shell
See bug description
Describe the bug
Using the steps from Tutorial_2_train_your_first_TTS_model.ipynb I've recorded a couple of .wav files (16 in total) and created the metadata.csv file. The training starts a run, appears to do some preliminary analysis and then asserts without a clear error message.
I'm at a loss. I can't find anything wrong with my (small) data set nor metadata. Any pointers?
training.py looks like this:
Environment
Additional context
No response