Closed indam23 closed 4 years ago
Wavenet training completed 185,000 steps (As set), but the predicted wavs consist of <1 sec snippets - I'm guessing this is the result of max_time_steps=5000? Synthesis with
python3 syntehsize.py --model="Tacotron-2" --mode="eval" --GTA="False"
fails with
ValueError: Dimension size must be evenly divisible by 4 but is 1 Number of ways to split should evenly divide the split dimension for 'model/split' (op: 'Split') with input shapes: [], [1] and with computed input tensors: input[0] = <0>.
closing as stale
Quick description: could not train WaveNet model, either as part of Tacotron-2 training or separately, because tacotron-output/gta/map.txt was not found. This persists with
GTA=False
, and various hacks attempted result in other errors. I am listing all the errors in sequenceas I am not sure how interrelated the various issues are. I found a workaround and am posting in case anyone else runs into it. I'm sure there is a more thorough fix/answer to the issue, and I doubt this fix addresses the core issue, but it got WaveNet training started at least. Command:python3 train.py --model="Tacotron-2"
params changed: steps =120000 Result: Finished training Tacotron model through 120000 steps, then crashed on the GTA syntehsis step with errordimension size must be evenly divisible by 4 but is 1
for the input_lengths field. I unfortunately do not have the full traceback because it was not output to the log file and subsequent commands in the terminal consigned it to beyond the realm of scrollback. When reattempting training (same command, with saved checkpoint), it goes straight to the GTA synthesis step and fails with an error that basically says,variable 'checkpoint' not defined
(Sorry, also no full traceback). This makes sense since "taco_state=True" after training, and thus the code underif not taco_state
in train.py will not be run and checkpoint remains undefined for the subsequentif not gta_state
section.Fix 1: add this line to train.py (main script) after line 63:
else: checkpoint = os.path.join(log_dir, 'taco_pretrained/')
Running train.py --model="Tacotron-2", GTA synthesis fails with same error about dimension size mismatch. Running train.py --model="WaveNet" results in error,no input file tacotron-output/gta/map.txt
Fix 2: set GTA=False. (This alone had no effect). To write out a map.txt which maps ground truth mels to themselves (i.e. ensures that WaveNet training does not use GTA mels), insert the following lines (copied from lower in script except the mel_output_filename line) at line 85 of tacotron/synthesize.py: ` with open(metadata_filename, encoding='utf-8') as f: metadata = [line.strip().split('|') for line in f] frame_shift_ms = hparams.hop_size / hparams.sample_rate hours = sum([int(x[4]) for x in metadata]) * frame_shift_ms / (3600) log('Loaded metadata for {} examples ({:.2f} hours)'.format(len(metadata), hours))
I had to manually set the second value of logs-Tacotron-2/state_log to 0 to make it rewrite gta/map.txt after changing this.
Result: train.py --model="WaveNet" finds and uses map.txt, but fails with the following error:
Could not satisfy explicit device specification '/device:GPU:0' because no supported kernel for GPU devices is available.
Fix 3: in file tacotron/synthesizer.py, replace line 38 with these two lines:
config = tf.ConfigProto(allow_soft_placement=True) ###ML self.session = tf.Session(config=config)
This is the same option set elsewhere where a tensorflow session is defined. Result: train.py --model="WaveNet" OOM error.ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[12802,4,512] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [[Node: model/inference/residual_block_conv_6/transpose = Transpose[T=DT_FLOAT, Tperm=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"](model/inference/residual_block_conv_6/Pad, model/optimizer/gradients/model/inference/residual_block_conv/transpose_3_grad/InvertPermutation)]]
This is probably a) not related to the above and b) hardware specific, but training started successfully withFix 4:
max_time_steps=5000
in hparams file. Result: WaveNet training starts and runs (so far) successfully, about 1.7sec/step. It hasn't completed yet, I will update when it does.