as-ideas / TransformerTTS

🤖💬 Transformer TTS: Implementation of a non-autoregressive Transformer based neural network for text to speech.
https://as-ideas.github.io/TransformerTTS/
Other
1.13k stars 225 forks source link

Shapes in forward model does not match #68

Open mehdihosseinimoghadam opened 3 years ago

mehdihosseinimoghadam commented 3 years ago

Hi there I'm trying to use the Forward model for my own dataset ut in extract_durations.py I face with this error. Any ideas why the shape does not fit? In addition it works fine with autoregressive model

1 Physical GPUs, 1 Logical GPUs
DurationExtraction_weighted_binary_filled(next)_fix_jumps_layer-1
fatal: not a git repository (or any of the parent directories): .git
WARNING: could not retrieve git hash. Command '['git', 'describe', '--always']' returned non-zero exit status 128.

CONFIGURATION ljspeech.melgan.autoregressive
- decoder_model_dimension : 256
- encoder_model_dimension : 512
- decoder_num_heads : [4, 4, 4, 4]
- encoder_num_heads : [4, 4, 4, 4]
- encoder_feed_forward_dimension : 1024
- decoder_feed_forward_dimension : 1024
- decoder_prenet_dimension : 256
- encoder_prenet_dimension : 512
- encoder_attention_conv_filters : 512
- decoder_attention_conv_filters : 512
- encoder_attention_conv_kernel : 3
- decoder_attention_conv_kernel : 3
- encoder_max_position_encoding : 1000
- decoder_max_position_encoding : 10000
- postnet_conv_filters : 256
- postnet_conv_layers : 5
- postnet_kernel_size : 5
- encoder_dense_blocks : 4
- decoder_dense_blocks : 4
- stop_loss_scaling : 8
- dropout_rate : 0.1
- decoder_prenet_dropout_schedule : [[0, 0.0], [25000, 0.0], [35000, 0.5]]
- learning_rate_schedule : [[0, 0.0001]]
- head_drop_schedule : [[0, 0], [15000, 1]]
- reduction_factor_schedule : [[0, 10], [80000, 5], [150000, 3], [250000, 1]]
- max_steps : 900000
- bucket_boundaries : [200, 300, 400, 500, 600, 700, 800, 900, 1000, 1200]
- bucket_batch_sizes : [64, 42, 32, 25, 21, 18, 16, 14, 12, 11, 1]
- debug : False
- validation_frequency : 1000
- prediction_frequency : 10000
- weights_save_frequency : 10000
- train_images_plotting_frequency : 1000
- keep_n_weights : 2
- keep_checkpoint_every_n_hours : 12
- n_steps_avg_losses : [100, 500, 1000, 5000]
- n_predictions : 2
- prediction_start_step : 20000
- audio_start_step : 40000
- audio_prediction_frequency : 10000
- data_directory : /content/content/data
- log_directory : /content/logdir
- metadata_filename : 12400_V2.csv
- train_metadata_filename : train_metafile.txt
- valid_metadata_filename : valid_metafile.txt
- session_name : melgan
- data_name : ljspeech
- n_samples : 100000
- n_test : 100
- mel_start_value : 0.5
- mel_end_value : -0.5
- max_mel_len : 1200
- min_mel_len : 80
- sampling_rate : 22050
- n_fft : 1024
- mel_channels : 80
- hop_length : 256
- win_length : 1024
- f_min : 0
- f_max : 8000
- normalizer : MelGAN
- phoneme_language : en-us
- with_stress : False
fatal: not a git repository (or any of the parent directories): .git
WARNING: could not check git hash. Command '['git', 'describe', '--always']' returned non-zero exit status 128.
WARNING: could not find weights file. Trying to load from 
 /content/logdir/ljspeech.melgan.autoregressive/weights.
Edit data_config.yaml to point at the right log directory.
restored weights from None at step 0
ERROR: model's reduction factor is greater than 1, check config. (r=10
Extracting attention from layer Decoder_DenseBlock4_CrossAttention
Processing dataset: : 0it [00:00, ?it/s]2020-10-22 11:35:05.933335: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
2020-10-22 11:35:07.492170: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.7
Traceback (most recent call last):
  File "/content/TransformerTTS/extract_durations.py", line 114, in <module>
    pred_mel = tf.expand_dims(1 - tf.squeeze(create_mel_padding_mask(mel_batch[:, 1:, :])), -1) * pred_mel
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/math_ops.py", line 1125, in binary_op_wrapper
    return func(x, y, name=name)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/math_ops.py", line 1457, in _mul_dispatch
    return multiply(x, y, name=name)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/util/dispatch.py", line 201, in wrapper
    return target(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/math_ops.py", line 509, in multiply
    return gen_math_ops.mul(x, y, name)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/gen_math_ops.py", line 6166, in mul
    _ops.raise_from_not_ok_status(e, name)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py", line 6843, in raise_from_not_ok_status
    six.raise_from(core._status_to_exception(e.code, message), None)
  File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [32,397,1] vs. [32,400,80] [Op:Mul]
cfrancesco commented 3 years ago

Hi, is it possible that you trained the autoregressive model up to a reduction factor of more than 1? (in your settings for less than 250K steps)

mehdihosseinimoghadam commented 3 years ago

Could you explain more? I didn't get your meaning. Here is what I have done so far in more detail. First of all my dataset is a little bit different from Ljspeech. (It is in Persian and the direction is RTL) also format of the train_metafile.txt in Ljspeech after running python create_training_data.py --config config/melgan is as follows (as an example):

but mine is a little bit different, like this (I changed the code to be able to work with my dataset ):

Also, I trained autoregressive for about 700K and the result was great, the only reason that I want to change to forward (learning fro scratch) is because of the faster response time. In your notebooks, I experimented that a single query with autoregressive take about ~7s, the same with forward take ~.5s

mehdihosseinimoghadam commented 3 years ago

And could you please tell me more about the reduction factor?