Rayhane-mamah / Tacotron-2

DeepMind's Tacotron-2 Tensorflow implementation
MIT License
2.27k stars 905 forks source link

Run the synthesize.py script and report the error as follows #359

Open darkmarbo opened 5 years ago

darkmarbo commented 5 years ago

Thank you for your selfless dedication!! The first time I run your script, the data used is ‘LJSpeech-1.1’:

  1. python3 preprocess.py
  2. python3 train.py --model='Tacotron-2'
  3. python3 synthesize.py --model='Tacotron-2' and the following error is reported during the synthesis phase: ############################################## (base) [szm@bogon Tacotron-2]$ python3 synthesize.py Using TensorFlow backend. Running End-to-End TTS Evaluation. Model: Tacotron-2 Synthesizing mel-spectrograms from text.. loaded model at logs-Tacotron-2/taco_pretrained/tacotron_model.ckpt-100000 Hyperparameters: GL_on_GPU: True NN_init: True NN_scaler: 0.3 allow_clipping_in_normalization: True attention_dim: 128 attention_filters: 32 attention_kernel: (31,) attention_win_size: 7 batch_norm_position: after cbhg_conv_channels: 128 cbhg_highway_units: 128 cbhg_highwaynet_layers: 4 cbhg_kernels: 8 cbhg_pool_size: 2 cbhg_projection: 256 cbhg_projection_kernel_size: 3 cbhg_rnn_units: 128 cdf_loss: False cin_channels: 80 cleaners: english_cleaners clip_for_wavenet: True clip_mels_length: True clip_outputs: True cross_entropy_pos_weight: 1 cumulative_weights: True decoder_layers: 2 decoder_lstm_units: 1024 embedding_dim: 512 enc_conv_channels: 512 enc_conv_kernel_size: (5,) enc_conv_num_layers: 3 encoder_lstm_units: 256 fmax: 7600 fmin: 55 frame_shift_ms: None freq_axis_kernel_size: 3 gate_channels: 256 gin_channels: -1 griffin_lim_iters: 60 hop_size: 275 input_type: raw kernel_size: 3 layers: 20 leaky_alpha: 0.4 legacy: True log_scale_min: -32.23619130191664 log_scale_min_gauss: -16.11809565095832 lower_bound_decay: 0.1 magnitude_power: 2.0 mask_decoder: False mask_encoder: True max_abs_value: 4.0 max_iters: 10000 max_mel_frames: 900 max_time_sec: None max_time_steps: 11000 min_level_db: -100 n_fft: 2048 n_speakers: 5 normalize_for_wavenet: True num_freq: 1025 num_mels: 80 out_channels: 2 outputs_per_step: 1 postnet_channels: 512 postnet_kernel_size: (5,) postnet_num_layers: 5 power: 1.5 predict_linear: True preemphasis: 0.97 preemphasize: True prenet_layers: [256, 256] quantize_channels: 65536 ref_level_db: 20 rescale: True rescaling_max: 0.999 residual_channels: 128 residual_legacy: True sample_rate: 22050 signal_normalization: True silence_threshold: 2 skip_out_channels: 128 smoothing: False speakers: ['speaker0', 'speaker1', 'speaker2', 'speaker3', 'speaker4'] speakers_path: None split_on_cpu: True stacks: 2 stop_at_any: True symmetric_mels: True synthesis_constraint: False synthesis_constraint_type: window tacotron_adam_beta1: 0.9 tacotron_adam_beta2: 0.999 tacotron_adam_epsilon: 1e-06 tacotron_batch_size: 32 tacotron_clip_gradients: True tacotron_data_random_state: 1234 tacotron_decay_learning_rate: True tacotron_decay_rate: 0.5 tacotron_decay_steps: 18000 tacotron_dropout_rate: 0.5 tacotron_final_learning_rate: 0.0001 tacotron_fine_tuning: False tacotron_initial_learning_rate: 0.001 tacotron_natural_eval: False tacotron_num_gpus: 1 tacotron_random_seed: 5339 tacotron_reg_weight: 1e-06 tacotron_scale_regularization: False tacotron_start_decay: 40000 tacotron_swap_with_cpu: False tacotron_synthesis_batch_size: 1 tacotron_teacher_forcing_decay_alpha: None tacotron_teacher_forcing_decay_steps: 40000 tacotron_teacher_forcing_final_ratio: 0.0 tacotron_teacher_forcing_init_ratio: 1.0 tacotron_teacher_forcing_mode: constant tacotron_teacher_forcing_ratio: 1.0 tacotron_teacher_forcing_start_decay: 10000 tacotron_test_batches: None tacotron_test_size: 0.05 tacotron_zoneout_rate: 0.1 train_with_GTA: True trim_fft_size: 2048 trim_hop_size: 512 trim_silence: True trim_top_db: 40 upsample_activation: Relu upsample_scales: [11, 25] upsample_type: SubPixel use_bias: True use_lws: False use_speaker_embedding: True wavenet_adam_beta1: 0.9 wavenet_adam_beta2: 0.999 wavenet_adam_epsilon: 1e-06 wavenet_batch_size: 8 wavenet_clip_gradients: True wavenet_data_random_state: 1234 wavenet_debug_mels: ['training_data/mels/mel-LJ001-0008.npy'] wavenet_debug_wavs: ['training_data/audio/audio-LJ001-0008.npy'] wavenet_decay_rate: 0.5 wavenet_decay_steps: 200000 wavenet_dropout: 0.05 wavenet_ema_decay: 0.9999 wavenet_gradient_max_norm: 100.0 wavenet_gradient_max_value: 5.0 wavenet_init_scale: 1.0 wavenet_learning_rate: 0.0001 wavenet_lr_schedule: exponential wavenet_natural_eval: False wavenet_num_gpus: 1 wavenet_pad_sides: 1 wavenet_random_seed: 5339 wavenet_swap_with_cpu: False wavenet_synth_debug: False wavenet_synthesis_batch_size: 20 wavenet_test_batches: 1 wavenet_test_size: None wavenet_warmup: 4000.0 wavenet_weight_normalization: False win_size: 1100 Constructing model: Tacotron initialisation done /gpu:0 Initialized Tacotron model. Dimensions (? = dynamic shape): Train mode: False Eval mode: False GTA mode: False Synthesis mode: True Input: (?, ?) device: 0 embedding: (?, ?, 512) enc conv out: (?, ?, 512) encoder out: (?, ?, 512) decoder out: (?, ?, 80) residual out: (?, ?, 512) projected residual out: (?, ?, 80) mel out: (?, ?, 80) linear out: (?, ?, 1025) out: (?, ?) Tacotron Parameters 29.016 Million. Loading checkpoint: logs-Tacotron-2/taco_pretrained/tacotron_model.ckpt-100000 Starting Synthesis 100%|████████████████████████████████████████████████████████| 16/16 [00:49<00:00, 2.25s/it] synthesized mel spectrograms at tacotron_output/eval Synthesizing audio from mel-spectrograms.. (This may take a while) loaded model at logs-Tacotron-2/wave_pretrained/wavenet_model.ckpt-67500 Hyperparameters: GL_on_GPU: True NN_init: True NN_scaler: 0.3 allow_clipping_in_normalization: True attention_dim: 128 attention_filters: 32 attention_kernel: (31,) attention_win_size: 7 batch_norm_position: after cbhg_conv_channels: 128 cbhg_highway_units: 128 cbhg_highwaynet_layers: 4 cbhg_kernels: 8 cbhg_pool_size: 2 cbhg_projection: 256 cbhg_projection_kernel_size: 3 cbhg_rnn_units: 128 cdf_loss: False cin_channels: 80 cleaners: english_cleaners clip_for_wavenet: True clip_mels_length: True clip_outputs: True cross_entropy_pos_weight: 1 cumulative_weights: True decoder_layers: 2 decoder_lstm_units: 1024 embedding_dim: 512 enc_conv_channels: 512 enc_conv_kernel_size: (5,) enc_conv_num_layers: 3 encoder_lstm_units: 256 fmax: 7600 fmin: 55 frame_shift_ms: None freq_axis_kernel_size: 3 gate_channels: 256 gin_channels: -1 griffin_lim_iters: 60 hop_size: 275 input_type: raw kernel_size: 3 layers: 20 leaky_alpha: 0.4 legacy: True log_scale_min: -32.23619130191664 log_scale_min_gauss: -16.11809565095832 lower_bound_decay: 0.1 magnitude_power: 2.0 mask_decoder: False mask_encoder: True max_abs_value: 4.0 max_iters: 10000 max_mel_frames: 900 max_time_sec: None max_time_steps: 11000 min_level_db: -100 n_fft: 2048 n_speakers: 5 normalize_for_wavenet: True num_freq: 1025 num_mels: 80 out_channels: 2 outputs_per_step: 1 postnet_channels: 512 postnet_kernel_size: (5,) postnet_num_layers: 5 power: 1.5 predict_linear: True preemphasis: 0.97 preemphasize: True prenet_layers: [256, 256] quantize_channels: 65536 ref_level_db: 20 rescale: True rescaling_max: 0.999 residual_channels: 128 residual_legacy: True sample_rate: 22050 signal_normalization: True silence_threshold: 2 skip_out_channels: 128 smoothing: False speakers: ['speaker0', 'speaker1', 'speaker2', 'speaker3', 'speaker4'] speakers_path: None split_on_cpu: True stacks: 2 stop_at_any: True symmetric_mels: True synthesis_constraint: False synthesis_constraint_type: window tacotron_adam_beta1: 0.9 tacotron_adam_beta2: 0.999 tacotron_adam_epsilon: 1e-06 tacotron_batch_size: 32 tacotron_clip_gradients: True tacotron_data_random_state: 1234 tacotron_decay_learning_rate: True tacotron_decay_rate: 0.5 tacotron_decay_steps: 18000 tacotron_dropout_rate: 0.5 tacotron_final_learning_rate: 0.0001 tacotron_fine_tuning: False tacotron_initial_learning_rate: 0.001 tacotron_natural_eval: False tacotron_num_gpus: 1 tacotron_random_seed: 5339 tacotron_reg_weight: 1e-06 tacotron_scale_regularization: False tacotron_start_decay: 40000 tacotron_swap_with_cpu: False tacotron_synthesis_batch_size: 1 tacotron_teacher_forcing_decay_alpha: None tacotron_teacher_forcing_decay_steps: 40000 tacotron_teacher_forcing_final_ratio: 0.0 tacotron_teacher_forcing_init_ratio: 1.0 tacotron_teacher_forcing_mode: constant tacotron_teacher_forcing_ratio: 1.0 tacotron_teacher_forcing_start_decay: 10000 tacotron_test_batches: None tacotron_test_size: 0.05 tacotron_zoneout_rate: 0.1 train_with_GTA: True trim_fft_size: 2048 trim_hop_size: 512 trim_silence: True trim_top_db: 40 upsample_activation: Relu upsample_scales: [11, 25] upsample_type: SubPixel use_bias: True use_lws: False use_speaker_embedding: True wavenet_adam_beta1: 0.9 wavenet_adam_beta2: 0.999 wavenet_adam_epsilon: 1e-06 wavenet_batch_size: 8 wavenet_clip_gradients: True wavenet_data_random_state: 1234 wavenet_debug_mels: ['training_data/mels/mel-LJ001-0008.npy'] wavenet_debug_wavs: ['training_data/audio/audio-LJ001-0008.npy'] wavenet_decay_rate: 0.5 wavenet_decay_steps: 200000 wavenet_dropout: 0.05 wavenet_ema_decay: 0.9999 wavenet_gradient_max_norm: 100.0 wavenet_gradient_max_value: 5.0 wavenet_init_scale: 1.0 wavenet_learning_rate: 0.0001 wavenet_lr_schedule: exponential wavenet_natural_eval: False wavenet_num_gpus: 1 wavenet_pad_sides: 1 wavenet_random_seed: 5339 wavenet_swap_with_cpu: False wavenet_synth_debug: False wavenet_synthesis_batch_size: 20 wavenet_test_batches: 1 wavenet_test_size: None wavenet_warmup: 4000.0 wavenet_weight_normalization: False win_size: 1100 Constructing model: WaveNet Initializing Wavenet model. Dimensions (? = dynamic shape): Train mode: False Eval mode: False Synthesis mode: True device: 0 local_condition: (?, 80, ?) outputs: (?, ?) Receptive Field: (4093 samples / 185.6 ms) WaveNet Parameters: 3.064 Million. Loading checkpoint: logs-Tacotron-2/wave_pretrained/wavenet_model.ckpt-67500 Traceback (most recent call last): File "/usr/bin/python3.6.4/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1322, in _do_call return fn(*args) File "/usr/bin/python3.6.4/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1307, in _run_fn options, feed_dict, fetch_list, target_list, run_metadata) File "/usr/bin/python3.6.4/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1409, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.NotFoundError: Key WaveNet_model/WaveNet_model/inference/ResidualConv1DGLU_0/residual_block_causal_conv_ResidualConv1DGLU_0/residual_block_causal_conv_ResidualConv1DGLU_0/bias/ExponentialMovingAverage not found in checkpoint [[Node: WaveNet_model/save/RestoreV2 = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_WaveNet_model/save/Const_0_0, WaveNet_model/save/RestoreV2/tensor_names, WaveNet_model/save/RestoreV2/shape_and_slices)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "synthesize.py", line 101, in main() File "synthesize.py", line 94, in main synthesize(args, hparams, taco_checkpoint, wave_checkpoint, sentences) File "synthesize.py", line 42, in synthesize wavenet_synthesize(args, hparams, wave_checkpoint) File "/data/szm/tacotron/tacotron2/Tacotron-2/wavenet_vocoder/synthesize.py", line 78, in wavenet_synthesize run_synthesis(args, checkpoint_path, output_dir, hparams) File "/data/szm/tacotron/tacotron2/Tacotron-2/wavenet_vocoder/synthesize.py", line 19, in run_synthesis synth.load(checkpoint_path, hparams) File "/data/szm/tacotron/tacotron2/Tacotron-2/wavenet_vocoder/synthesizer.py", line 44, in load load_averaged_model(self.session, sh_saver, checkpoint_path) File "/data/szm/tacotron/tacotron2/Tacotron-2/wavenet_vocoder/train.py", line 86, in load_averaged_model sh_saver.restore(sess, checkpoint_path) File "/usr/bin/python3.6.4/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1802, in restore {self.saver_def.filename_tensor_name: save_path}) File "/usr/bin/python3.6.4/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 900, in run run_metadata_ptr) File "/usr/bin/python3.6.4/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1135, in _run feed_dict_tensor, options, run_metadata) File "/usr/bin/python3.6.4/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1316, in _do_run run_metadata) File "/usr/bin/python3.6.4/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1335, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.NotFoundError: Key WaveNet_model/WaveNet_model/inference/ResidualConv1DGLU_0/residual_block_causal_conv_ResidualConv1DGLU_0/residual_block_causal_conv_ResidualConv1DGLU_0/bias/ExponentialMovingAverage not found in checkpoint [[Node: WaveNet_model/save/RestoreV2 = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_WaveNet_model/save/Const_0_0, WaveNet_model/save/RestoreV2/tensor_names, WaveNet_model/save/RestoreV2/shape_and_slices)]]

Caused by op 'WaveNet_model/save/RestoreV2', defined at: File "synthesize.py", line 101, in main() File "synthesize.py", line 94, in main synthesize(args, hparams, taco_checkpoint, wave_checkpoint, sentences) File "synthesize.py", line 42, in synthesize wavenet_synthesize(args, hparams, wave_checkpoint) File "/data/szm/tacotron/tacotron2/Tacotron-2/wavenet_vocoder/synthesize.py", line 78, in wavenet_synthesize run_synthesis(args, checkpoint_path, output_dir, hparams) File "/data/szm/tacotron/tacotron2/Tacotron-2/wavenet_vocoder/synthesize.py", line 19, in run_synthesis synth.load(checkpoint_path, hparams) File "/data/szm/tacotron/tacotron2/Tacotron-2/wavenet_vocoder/synthesizer.py", line 33, in load sh_saver = create_shadow_saver(self.model) File "/data/szm/tacotron/tacotron2/Tacotron-2/wavenet_vocoder/train.py", line 83, in create_shadow_saver return tf.train.Saver(shadow_dict, max_to_keep=20) File "/usr/bin/python3.6.4/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1338, in init self.build() File "/usr/bin/python3.6.4/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1347, in build self._build(self._filename, build_save=True, build_restore=True) File "/usr/bin/python3.6.4/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1384, in _build build_save=build_save, build_restore=build_restore) File "/usr/bin/python3.6.4/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 835, in _build_internal restore_sequentially, reshape) File "/usr/bin/python3.6.4/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 472, in _AddRestoreOps restore_sequentially) File "/usr/bin/python3.6.4/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 886, in bulk_restore return io_ops.restore_v2(filename_tensor, names, slices, dtypes) File "/usr/bin/python3.6.4/lib/python3.6/site-packages/tensorflow/python/ops/gen_io_ops.py", line 1463, in restore_v2 shape_and_slices=shape_and_slices, dtypes=dtypes, name=name) File "/usr/bin/python3.6.4/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper op_def=op_def) File "/usr/bin/python3.6.4/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3392, in create_op op_def=op_def) File "/usr/bin/python3.6.4/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1718, in init self._traceback = self._graph._extract_stack() # pylint: disable=protected-access

NotFoundError (see above for traceback): Key WaveNet_model/WaveNet_model/inference/ResidualConv1DGLU_0/residual_block_causal_conv_ResidualConv1DGLU_0/residual_block_causal_conv_ResidualConv1DGLU_0/bias/ExponentialMovingAverage not found in checkpoint [[Node: WaveNet_model/save/RestoreV2 = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_WaveNet_model/save/Const_0_0, WaveNet_model/save/RestoreV2/tensor_names, WaveNet_model/save/RestoreV2/shape_and_slices)]]

RoelVdP commented 5 years ago
NotFoundError (see above for traceback): Key WaveNet_model/WaveNet_model/inference/ResidualConv1DGLU_0/residual_block_causal_conv_ResidualConv1DGLU_0/residual_block_causal_conv_ResidualConv1DGLU_0/bias/ExponentialMovingAverage not found in checkpoint

Seems to be the source? ExponentialMovingAverage - lib issue maybe?