OOM tacotron2 - Githubissues

PandaForce1 commented 3 years ago

Hey guy's, when trying to run the tacotron 2 training I run in the following error:


Traceback (most recent call last):
  File "train_tacotron2.py", line 514, in <module>
    main()
  File "train_tacotron2.py", line 502, in main
    trainer.fit(
  File "C:\Users\supre\anaconda3\envs\tts4-tensorflow\lib\site-packages\tensorflow_tts\trainers\base_trainer.py", line 1010, in fit
    self.run()
  File "C:\Users\supre\anaconda3\envs\tts4-tensorflow\lib\site-packages\tensorflow_tts\trainers\base_trainer.py", line 104, in run
    self._train_epoch()
  File "C:\Users\supre\anaconda3\envs\tts4-tensorflow\lib\site-packages\tensorflow_tts\trainers\base_trainer.py", line 126, in _train_epoch
    self._train_step(batch)
  File "train_tacotron2.py", line 109, in _train_step
    self.one_step_forward(batch)
  File "C:\Users\supre\anaconda3\envs\tts4-tensorflow\lib\site-packages\tensorflow\python\eager\def_function.py", line 828, in __call__
    result = self._call(*args, **kwds)
  File "C:\Users\supre\anaconda3\envs\tts4-tensorflow\lib\site-packages\tensorflow\python\eager\def_function.py", line 888, in _call
    return self._stateless_fn(*args, **kwds)
  File "C:\Users\supre\anaconda3\envs\tts4-tensorflow\lib\site-packages\tensorflow\python\eager\function.py", line 2942, in __call__
    return graph_function._call_flat(
  File "C:\Users\supre\anaconda3\envs\tts4-tensorflow\lib\site-packages\tensorflow\python\eager\function.py", line 1918, in _call_flat
    return self._build_call_outputs(self._inference_function.call(
  File "C:\Users\supre\anaconda3\envs\tts4-tensorflow\lib\site-packages\tensorflow\python\eager\function.py", line 555, in call
    outputs = execute.execute(
  File "C:\Users\supre\anaconda3\envs\tts4-tensorflow\lib\site-packages\tensorflow\python\eager\execute.py", line 59, in quick_execute
    tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.NotFoundError:  No algorithm worked!
         [[node gradients/tacotron2/post_net/tf_tacotron_conv_batch_norm_9/conv_._4/conv1d_grad/Conv2DBackpropFilter (defined at C:\Users\supre\anaconda3\envs\tts4-tensorflow\lib\site-packages\tensorflow_tts\trainers\base_trainer.py:820) ]] [Op:__inference__one_step_forward_22874]

Errors may have originated from an input operation.
Input Source operations connected to node gradients/tacotron2/post_net/tf_tacotron_conv_batch_norm_9/conv_._4/conv1d_grad/Conv2DBackpropFilter:
 tacotron2/post_net/tf_tacotron_conv_batch_norm_9/conv_._4/conv1d/ExpandDims (defined at C:\Users\supre\anaconda3\envs\tts4-tensorflow\lib\site-packages\tensorflow_tts\models\tacotron2.py:110)

Function call stack:
_one_step_forward

[train]:   0%|

My setup: gpu= rtx 3080 tensorflow=2.4.0 tensorflow-gpu=2.5.0 cuda=11.0 cudnn=8.0

My dataset: ljspeech(preprocessed)

My training command line:

python train_tacotron2.py --train-dir ./dump_ljspeech/train/ --dev-dir ./dump_ljspeech/valid/ --outdir C://Users//supre//Documents//TensorflowTTS//main_trained_models//trained_tacatron_models//taco.v3 --config ./conf/tacotron2.v1.yaml --use-norm 1 --mixed_precision 0

Hopefully someone is able to help me out with this issue:) Thank's for every suggestion and help in advance:)

PandaForce1 commented 3 years ago

Fixed it:)

Aksh97 commented 2 years ago

How did you fixed it?

p0p4k commented 2 years ago

@PandaForce1 Can you help me to fix the same OOM issue, please? Thanks

TensorSpeech / TensorFlowTTS

OOM tacotron2 #619