MlWoo / Tacotron-2

DeepMind's Tacotron-2 Tensorflow implementation
MIT License
19 stars 5 forks source link

wavenet training fails due to undefined checkpoint and map.txt #1

Closed indam23 closed 4 years ago

indam23 commented 6 years ago

Quick description: could not train WaveNet model, either as part of Tacotron-2 training or separately, because tacotron-output/gta/map.txt was not found. This persists with GTA=False, and various hacks attempted result in other errors. I am listing all the errors in sequenceas I am not sure how interrelated the various issues are. I found a workaround and am posting in case anyone else runs into it. I'm sure there is a more thorough fix/answer to the issue, and I doubt this fix addresses the core issue, but it got WaveNet training started at least. Command: python3 train.py --model="Tacotron-2" params changed: steps =120000 Result: Finished training Tacotron model through 120000 steps, then crashed on the GTA syntehsis step with error dimension size must be evenly divisible by 4 but is 1 for the input_lengths field. I unfortunately do not have the full traceback because it was not output to the log file and subsequent commands in the terminal consigned it to beyond the realm of scrollback. When reattempting training (same command, with saved checkpoint), it goes straight to the GTA synthesis step and fails with an error that basically says, variable 'checkpoint' not defined (Sorry, also no full traceback). This makes sense since "taco_state=True" after training, and thus the code under if not taco_state in train.py will not be run and checkpoint remains undefined for the subsequent if not gta_state section.

Fix 1: add this line to train.py (main script) after line 63: else: checkpoint = os.path.join(log_dir, 'taco_pretrained/') Running train.py --model="Tacotron-2", GTA synthesis fails with same error about dimension size mismatch. Running train.py --model="WaveNet" results in error, no input file tacotron-output/gta/map.txt

Fix 2: set GTA=False. (This alone had no effect). To write out a map.txt which maps ground truth mels to themselves (i.e. ensures that WaveNet training does not use GTA mels), insert the following lines (copied from lower in script except the mel_output_filename line) at line 85 of tacotron/synthesize.py: ` with open(metadata_filename, encoding='utf-8') as f: metadata = [line.strip().split('|') for line in f] frame_shift_ms = hparams.hop_size / hparams.sample_rate hours = sum([int(x[4]) for x in metadata]) * frame_shift_ms / (3600) log('Loaded metadata for {} examples ({:.2f} hours)'.format(len(metadata), hours))

log('starting synthesis')
mel_dir = os.path.join(args.input_dir, 'mels')
wav_dir = os.path.join(args.input_dir, 'audio')
with open(os.path.join(synth_dir, 'map.txt'), 'w') as file:
    for i, meta in enumerate(tqdm(metadata)):
        text = meta[5]
        mel_filename = os.path.join(mel_dir, meta[1])
        wav_filename = os.path.join(wav_dir, meta[0])
        mel_output_filename = mel_filename ### i.e. output file=ground truth, not prediction
        file.write('{}|{}|{}|{}\n'.format(wav_filename, mel_filename, mel_output_filename, text))
log('synthesized mel spectrograms at {}'.format(synth_dir))
return os.path.join(synth_dir, 'map.txt')`

I had to manually set the second value of logs-Tacotron-2/state_log to 0 to make it rewrite gta/map.txt after changing this.

Result: train.py --model="WaveNet" finds and uses map.txt, but fails with the following error: Could not satisfy explicit device specification '/device:GPU:0' because no supported kernel for GPU devices is available.

Fix 3: in file tacotron/synthesizer.py, replace line 38 with these two lines: config = tf.ConfigProto(allow_soft_placement=True) ###ML self.session = tf.Session(config=config) This is the same option set elsewhere where a tensorflow session is defined. Result: train.py --model="WaveNet" OOM error. ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[12802,4,512] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [[Node: model/inference/residual_block_conv_6/transpose = Transpose[T=DT_FLOAT, Tperm=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"](model/inference/residual_block_conv_6/Pad, model/optimizer/gradients/model/inference/residual_block_conv/transpose_3_grad/InvertPermutation)]] This is probably a) not related to the above and b) hardware specific, but training started successfully with

Fix 4: max_time_steps=5000 in hparams file. Result: WaveNet training starts and runs (so far) successfully, about 1.7sec/step. It hasn't completed yet, I will update when it does.

indam23 commented 6 years ago

Wavenet training completed 185,000 steps (As set), but the predicted wavs consist of <1 sec snippets - I'm guessing this is the result of max_time_steps=5000? Synthesis with python3 syntehsize.py --model="Tacotron-2" --mode="eval" --GTA="False" fails with ValueError: Dimension size must be evenly divisible by 4 but is 1 Number of ways to split should evenly divide the split dimension for 'model/split' (op: 'Split') with input shapes: [], [1] and with computed input tensors: input[0] = <0>.

indam23 commented 4 years ago

closing as stale