coqui-ai / TTS

🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
http://coqui.ai
Mozilla Public License 2.0
35.48k stars 4.33k forks source link

[Bug] #1849

Closed determo13 closed 2 years ago

determo13 commented 2 years ago

Describe the bug

got debian on VM and installed TTS, tested text synthesiz and all works good, but as soon as I'm trying to train model following training guide for beginners, https://tts.readthedocs.io/en/latest/training_a_model.html on epoch 1 got error

To Reproduce

following training guide for beginners, https://tts.readthedocs.io/en/latest/training_a_model.html

Expected behavior

No response

Logs

--› STEP: 212/406 -- GLOBAL STEP: 1025
> loss: 0.12504
(0.18646)
> log mle: -0.19896
(-0.15915)
> loss_dur: 0.32400 (0.34561)
> grad norm: 1,96832
(2.41354)
> current_r: 0.00026
> step_time: 24.56490
(20.01145)
> loader time: 0.00590
(0.00676)
1 Run is kept in run-August-09-2022_05+58PM-d46fbc24/
Traceback (most recent call Last) :
File "/home/user1/.local/lib/python3.9/site-packages/trainer/trainer.py", line 1533, in fit
self._fit()
File "/home/user1/.local/lib/python3.9/site-packages/trainer/trainer.py", line 1517, in _fit
self.train_epoch()
File."/home/user1/.local/lib/python3.9/site-packages/trainer/trainer.py",line1282,intrain_epoch
-9
= self.train_step(batch, batch_num_steps, cur_step, loader_start_time)
File
ñ/home/user1/.local/Lib/python3.9/site-packages/trainer/trainer.py",line1114,intrain_step
outputs, loss_dict_new, step_time = self._optimize(
File "/home/user1/.local/lib/python3.9/site-packages/trainer/trainer.py", line 998, in _optimize
outputs, loss_dict = self._model_ train_ step(batch, model, criterion)
File "/home/user1/.local/lib/python3.9/site-packages/trainer/trainer.py", line 954, in _model_train_step
return model.train_step(*input_args)
File "/home/user1/TTS/TTS/tts/models/glow_tts.py", line 408, in train_step
outputs = self. forward(
File »/home/user1/TTS/TTS/tts/models/glow_tts.py", line 240, in forward
2, logdet = self.decoder (y, y_mask, g=g, reverse=False)
File "/home/user1/.local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in
call im
pi
return forward_call(*input, **kwargs)
File "/home/user1/TTS/TTS/tts/layers/glow_tts/decoder.py", line 131, in forward
X, logdet = f(x, x_mask, g=g, reverse=reverse)
File "/home/user1/.local/lib/python3.9/site-packages/torch/nn/modules/module.py",line 1130, in_call_im
pl
return forward_call(*input, **kwargs)
File "/home/user1/TTS/TTS/tts/layers/glow_tts/glow.py", line 214, in forward
x = self.wn(x, x_mask, g)
File
"/home/user1/.local/lib/python3.9/site-packages/torch/nn/modules/module.py",line 1130, in _call_im
pl
return forward_call (*input, **kwargs)
File "/home/user1/TTS/TTS/tts/layers/generic/wavenet.py", line 108, in forward
res_skip_acts = self.res_skip_layers[i] (acts)
File
"/home/user1/.local/lib/python3.9/site-packages/torch/nn/modules/module.py",line1148, in _call_im
pl
result = forward_call(*input, **kwargs)
File "/home/user1/.local/lib/python3.9/site-packages/torch/nn/modules/conv.py",line 307, in forward
return self._conv_forward (input, self.weight, self.bias)
File "/home/user1/.local/lib/python3.9/site-packages/torch/nn/modules/conv.py",line303,in_conv_forwa
rd
return F.convid(input, weight, bias, self.stride,
File "/home/user1/.local/lib/python3.9/site-packages/torch/utils/data/_utils/signa_handlinE.Py"',line6
6, in handler
_error_if_any_worker_fails()
RuntimeError: DataLoader worker (pid 53930) is killed by signal: Killed.

Environment

version 0.71 

NO CUDA 

pytorch installed via pip

Additional context

No response

p0p4k commented 2 years ago

debian on VM

Has to be memory issue. Try setting num_workers = 0 for the dataloader.

determo13 commented 2 years ago

Right now its at epoch 6 and going. It looks like was a memory issue after all. Any suggestions on min memory requirement per worker?

p0p4k commented 2 years ago

Right now its at epoch 6 and going. It looks like was a memory issue after all. Any suggestions on min memory requirement per worker?

No idea, tbh. Also, would suggest using Google colab, at least a week GPU is better than no GPU for training.