Evaluation issues for Tacotron2 during training

neng668 commented 3 years ago

I'm currently trying to train a voice using Tacotron2, basically following the Tacotron2 example. I'm using the LJSpeech pre-trained model as a base model and am using additional data (150 English sentences in LJSpeech format) to adapt the voice to that of a target speaker. At the moment I'm running it off Google Colab due to personal hardware restrictions so therefore would sometimes need to reinstall TensorflowTTS with pip.

The command I'm executing is this:

!CUDA_VISIBLE_DEVICES=0 python examples/tacotron2/train_tacotron2.py \
  --train-dir ./dump_akl_nz_sh/train/ \
  --dev-dir ./dump_akl_nz_sh/valid/ \
  --outdir ./examples/tacotron2/exp/train.tacotron2.v1/ \
  --config ./examples/tacotron2/conf/tacotron2.v1.yaml \
  --use-norm 1 \
  --mixed_precision 0 \
  --pretrained ./examples/tacotron2/exp/train.tacotron2.v1/checkpoints/model-120000_LJSpeech.h5

The command was working previously earlier in the month. However. over the last few days I'm having an issue where during evaluation after 500 steps the UnboundLocalError is thrown:

[train]:   0% 485/200000 [41:17<277:26:54,  5.01s/it]2021-01-26 02:13:11,548 (base_trainer:140) INFO: (Steps: 485) Finished 97 epoch training (5 steps per epoch).
[train]:   0% 490/200000 [41:42<275:16:32,  4.97s/it]2021-01-26 02:13:36,405 (base_trainer:140) INFO: (Steps: 490) Finished 98 epoch training (5 steps per epoch).
[train]:   0% 495/200000 [42:07<276:56:32,  5.00s/it]2021-01-26 02:14:01,579 (base_trainer:140) INFO: (Steps: 495) Finished 99 epoch training (5 steps per epoch).
[train]:   0% 500/200000 [42:32<276:38:55,  4.99s/it]2021-01-26 02:14:26,457 (base_trainer:883) INFO: (Steps: 500) Start evaluation.
[eval]: 0it [00:00, ?it/s]2021-01-26 02:14:26.620295: W tensorflow/core/grappler/optimizers/loop_optimizer.cc:906] Skipping loop optimization for Merge node with control input: cond/branch_executed/_8
[eval]: 0it [00:00, ?it/s]
Traceback (most recent call last):
  File "examples/tacotron2/train_tacotron2.py", line 513, in <module>
    main()
  File "examples/tacotron2/train_tacotron2.py", line 505, in main
    resume=args.resume,
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_tts/trainers/base_trainer.py", line 999, in fit
    self.run()
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_tts/trainers/base_trainer.py", line 103, in run
    self._train_epoch()
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_tts/trainers/base_trainer.py", line 129, in _train_epoch
    self._check_eval_interval()
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_tts/trainers/base_trainer.py", line 166, in _check_eval_interval
    self._eval_epoch()
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_tts/trainers/base_trainer.py", line 897, in _eval_epoch
    f"(Steps: {self.steps}) Finished evaluation "
UnboundLocalError: local variable 'eval_steps_per_epoch' referenced before assignment
[train]:   0% 500/200000 [42:32<282:55:56,  5.11s/it]

Any help to resolve this would be greatly appreciated!

dathudeptrai commented 3 years ago

@neng668 i do not know why it happened, see the code here (https://github.com/TensorSpeech/TensorFlowTTS/blob/master/tensorflow_tts/trainers/base_trainer.py#L897), i don't think there is a bug ?

AdityaJain1030 commented 3 years ago

@neng668 This may be an issue with your batch size. Make sure that the batch-size in the config file is greater than the number of audio clips in your dataset. See #498

akashicMarga commented 3 years ago

@dathudeptrai it seems like pip package is not up to date. I am also getting the above error.

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

TensorSpeech / TensorFlowTTS

Evaluation issues for Tacotron2 during training #476