Rayhane-mamah / Tacotron-2

DeepMind's Tacotron-2 Tensorflow implementation
MIT License
2.25k stars 911 forks source link

Channel dimension of the inputs should be defined. Found 'None' #487

Closed ravi-0841 closed 4 years ago

ravi-0841 commented 4 years ago

I am trying to train only the wavenet model for cmu-arctic dataset. The model immediately throws an error saying channel dimension should be defined. I am not sure how to fix this issue. Tensorflow version -> 1.14.0 Keras version -> 2.1.2 Here is the full log:

Loading training data from: ./audio_data/cmu-txt/map.txt Using model: WaveNet Hyperparameters: GL_on_GPU: True NN_init: True NN_scaler: 0.3 allow_clipping_in_normalization: True attention_dim: 128 attention_filters: 32 attention_kernel: (31,) attention_win_size: 7 batch_norm_position: after cbhg_conv_channels: 128 cbhg_highway_units: 128 cbhg_highwaynet_layers: 4 cbhg_kernels: 8 cbhg_pool_size: 2 cbhg_projection: 256 cbhg_projection_kernel_size: 3 cbhg_rnn_units: 128 cdf_loss: False cin_channels: 80 cleaners: english_cleaners clip_for_wavenet: True clip_mels_length: True clip_outputs: True cross_entropy_pos_weight: 1 cumulative_weights: True decoder_layers: 2 decoder_lstm_units: 1024 embedding_dim: 512 enc_conv_channels: 512 enc_conv_kernel_size: (5,) enc_conv_num_layers: 3 encoder_lstm_units: 256 fmax: 7600 fmin: 55 frame_shift_ms: None freq_axis_kernel_size: 3 gate_channels: 256 gin_channels: 16 griffin_lim_iters: 60 hop_size: 80 input_type: raw kernel_size: 3 layers: 20 leaky_alpha: 0.4 legacy: True log_scale_min: -32.23619130191664 log_scale_min_gauss: -16.11809565095832 lower_bound_decay: 0.1 magnitude_power: 2.0 mask_decoder: False mask_encoder: True max_abs_value: 4.0 max_iters: 10000 max_mel_frames: 900 max_time_sec: None max_time_steps: 11000 min_level_db: -300 n_fft: 1024 n_speakers: 4 normalize_for_wavenet: True num_freq: 513 num_mels: 80 out_channels: 2 outputs_per_step: 1 postnet_channels: 512 postnet_kernel_size: (5,) postnet_num_layers: 5 power: 1.5 predict_linear: True preemphasis: 0.97 preemphasize: True prenet_layers: [256, 256] quantize_channels: 65536 ref_level_db: 20 rescale: True rescaling_max: 0.999 residual_channels: 128 residual_legacy: True sample_rate: 16000 signal_normalization: True silence_threshold: 2 skip_out_channels: 128 smoothing: False speakers: ['cmu_us_f1', 'cmu_us_f2', 'cmu_us_m1', 'cmu_us_m2'] speakers_path: None split_on_cpu: True stacks: 2 stop_at_any: True symmetric_mels: True synthesis_constraint: False synthesis_constraint_type: window tacotron_adam_beta1: 0.9 tacotron_adam_beta2: 0.999 tacotron_adam_epsilon: 1e-06 tacotron_batch_size: 32 tacotron_clip_gradients: True tacotron_data_random_state: 1234 tacotron_decay_learning_rate: True tacotron_decay_rate: 0.5 tacotron_decay_steps: 18000 tacotron_dropout_rate: 0.5 tacotron_final_learning_rate: 0.0001 tacotron_fine_tuning: False tacotron_initial_learning_rate: 0.001 tacotron_natural_eval: False tacotron_num_gpus: 1 tacotron_random_seed: 5339 tacotron_reg_weight: 1e-06 tacotron_scale_regularization: False tacotron_start_decay: 40000 tacotron_swap_with_cpu: False tacotron_synthesis_batch_size: 1 tacotron_teacher_forcing_decay_alpha: None tacotron_teacher_forcing_decay_steps: 40000 tacotron_teacher_forcing_final_ratio: 0.0 tacotron_teacher_forcing_init_ratio: 1.0 tacotron_teacher_forcing_mode: constant tacotron_teacher_forcing_ratio: 1.0 tacotron_teacher_forcing_start_decay: 10000 tacotron_test_batches: None tacotron_test_size: 0.05 tacotron_zoneout_rate: 0.1 train_with_GTA: False trim_fft_size: 2048 trim_hop_size: 512 trim_silence: True trim_top_db: 40 upsample_activation: Relu upsample_scales: [11, 25] upsample_type: SubPixel use_bias: True use_lws: False use_speaker_embedding: True wavenet_adam_beta1: 0.9 wavenet_adam_beta2: 0.999 wavenet_adam_epsilon: 1e-06 wavenet_batch_size: 8 wavenet_clip_gradients: True wavenet_data_random_state: 1234 wavenet_debug_mels: ['training_data/mels/mel-LJ001-0008.npy'] wavenet_debug_wavs: ['training_data/audio/audio-LJ001-0008.npy'] wavenet_decay_rate: 0.5 wavenet_decay_steps: 200000 wavenet_dropout: 0.05 wavenet_ema_decay: 0.9999 wavenet_gradient_max_norm: 100.0 wavenet_gradient_max_value: 5.0 wavenet_init_scale: 1.0 wavenet_learning_rate: 0.001 wavenet_lr_schedule: exponential wavenet_natural_eval: False wavenet_num_gpus: 1 wavenet_pad_sides: 1 wavenet_random_seed: 5339 wavenet_swap_with_cpu: False wavenet_synth_debug: False wavenet_synthesis_batch_size: 20 wavenet_test_batches: 1 wavenet_test_size: None wavenet_warmup: 4000.0 wavenet_weight_normalization: False win_size: 80 WARNING:tensorflow:From /home/ravi/Desktop/tacotron-2/wavenet_vocoder/train.py:221: The name tf.set_random_seed is deprecated. Please use tf.compat.v1.set_random_seed instead.

WARNING:tensorflow:From /home/ravi/Desktop/tacotron-2/wavenet_vocoder/train.py:225: The name tf.variable_scope is deprecated. Please use tf.compat.v1.variable_scope instead.

WARNING:tensorflow:From /home/ravi/Desktop/tacotron-2/wavenet_vocoder/feeder.py:75: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.

WARNING:tensorflow:From /home/ravi/Desktop/tacotron-2/wavenet_vocoder/feeder.py:99: The name tf.FIFOQueue is deprecated. Please use tf.queue.FIFOQueue instead.

WARNING:tensorflow:From /home/ravi/Desktop/tacotron-2/wavenet_vocoder/train.py:169: The name tf.AUTO_REUSE is deprecated. Please use tf.compat.v1.AUTO_REUSE instead.

WARNING:tensorflow:From /home/ravi/Desktop/tacotron-2/wavenet_vocoder/models/modules.py:206: The name tf.layers.Conv1D is deprecated. Please use tf.compat.v1.layers.Conv1D instead.

WARNING:tensorflow:From /home/ravi/Desktop/tacotron-2/wavenet_vocoder/models/modules.py:553: calling Constant.init (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version. Instructions for updating: Call initializer instance with the dtype argument instead of passing it to the constructor Initializing Wavenet model. Dimensions (? = dynamic shape): Train mode: True Eval mode: False Synthesis mode: False WARNING:tensorflow:From /home/ravi/Desktop/tacotron-2/wavenet_vocoder/models/wavenet.py:268: The name tf.train.replica_device_setter is deprecated. Please use tf.compat.v1.train.replica_device_setter instead.

device: 0 WARNING:tensorflow:From /home/ravi/Desktop/tacotron-2/wavenet_vocoder/models/modules.py:256: The name tf.layers.InputSpec is deprecated. Please use tf.keras.layers.InputSpec instead.

WARNING:tensorflow:From /home/ravi/anaconda3/lib/python3.7/site-packages/tensorflow/python/ops/init_ops.py:1251: calling VarianceScaling.init (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version. Instructions for updating: Call initializer instance with the dtype argument instead of passing it to the constructor WARNING:tensorflow:From /home/ravi/Desktop/tacotron-2/wavenet_vocoder/models/modules.py:484: dropout (from tensorflow.python.layers.core) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.dropout instead. Traceback (most recent call last): File "train.py", line 143, in main() File "train.py", line 135, in main wavenet_train(args, log_dir, hparams, args.wavenet_input) File "/home/ravi/Desktop/tacotron-2/wavenet_vocoder/train.py", line 346, in wavenet_train return train(log_dir, args, hparams, input_path) File "/home/ravi/Desktop/tacotron-2/wavenet_vocoder/train.py", line 230, in train model, stats = model_train_mode(args, feeder, hparams, global_step) File "/home/ravi/Desktop/tacotron-2/wavenet_vocoder/train.py", line 176, in model_train_mode feeder.input_lengths, x=feeder.inputs) File "/home/ravi/Desktop/tacotron-2/wavenet_vocoder/models/wavenet.py", line 277, in initialize y_hat_train = self.step(tower_x[i], tower_c[i], tower_g[i], softmax=False) #softmax is automatically computed inside softmax_cross_entropy if needed File "/home/ravi/Desktop/tacotron-2/wavenet_vocoder/models/wavenet.py", line 708, in step x, h = conv(x, c=c, g=g_bct) File "/home/ravi/anaconda3/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py", line 634, in call outputs = call_fn(inputs, *args, **kwargs) File "/home/ravi/anaconda3/lib/python3.7/site-packages/tensorflow/python/autograph/impl/api.py", line 149, in wrapper raise e.ag_error_metadata.to_exception(type(e)) ValueError: in converted code: relative to /home/ravi:

Desktop/tacotron-2/wavenet_vocoder/models/modules.py:465 call *
    x, s, _ = self.step(x, c=c, g=g, is_incremental=False)
Desktop/tacotron-2/wavenet_vocoder/models/modules.py:506 step
    g = _conv1x1_forward(self.conv1x1g, g, is_incremental)
Desktop/tacotron-2/wavenet_vocoder/models/modules.py:779 _conv1x1_forward
    return conv(x)
anaconda3/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py:591 __call__
    self._maybe_build(inputs)
anaconda3/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py:1881 _maybe_build
    self.build(input_shapes)
Desktop/tacotron-2/wavenet_vocoder/models/modules.py:262 build
    self.layer.build(input_shape)
anaconda3/lib/python3.7/site-packages/tensorflow/python/keras/layers/convolutional.py:153 build
    raise ValueError('The channel dimension of the inputs '

ValueError: The channel dimension of the inputs should be defined. Found `None`.