TensorSpeech / TensorFlowTTS

:stuck_out_tongue_closed_eyes: TensorFlowTTS: Real-Time State-of-the-art Speech Synthesis for Tensorflow 2 (supported including English, French, Korean, Chinese, German and Easy to adapt for other languages)
https://tensorspeech.github.io/TensorFlowTTS/
Apache License 2.0
3.84k stars 814 forks source link

End-to-End Example aborts with ValueError: Shapes (199, 384) and (149, 384) are incompatible #315

Closed ErfolgreichCharismatisch closed 3 years ago

ErfolgreichCharismatisch commented 4 years ago

OS: Windows 10 Python: Anaconda

I chose your "End-to-End Example".

import numpy as np
import soundfile as sf
import yaml

import tensorflow as tf

from tensorflow_tts.inference import AutoConfig
from tensorflow_tts.inference import TFAutoModel
from tensorflow_tts.inference import AutoProcessor

# initialize fastspeech model.
fs_config = AutoConfig.from_pretrained('./examples/fastspeech/conf/fastspeech.v1.yaml')
fastspeech = TFAutoModel.from_pretrained(
    config=fs_config,
    pretrained_path="./examples/fastspeech/pretrained/model-195000.h5"
)

# initialize melgan model
melgan_config = AutoConfig.from_pretrained('./examples/melgan/conf/melgan.v1.yaml')
melgan = TFAutoModel.from_pretrained(
    config=melgan_config,
    pretrained_path="./examples/melgan/checkpoint/generator-1500000.h5"
)

# inference
processor = AutoProcessor.from_pretrained(pretrained_path="./test/files/ljspeech_mapper.json")

ids = processor.text_to_sequence("Recent research at Harvard has shown meditating for as little as 8 weeks, can actually increase the grey matter in the parts of the brain responsible for emotional regulation, and learning.")
ids = tf.expand_dims(ids, 0)
# fastspeech inference

masked_mel_before, masked_mel_after, duration_outputs = fastspeech.inference(
    ids,
    speaker_ids=tf.zeros(shape=[tf.shape(ids)[0]], dtype=tf.int32),
    speed_ratios=tf.constant([1.0], dtype=tf.float32)
)

# melgan inference
audio_before = melgan.inference(masked_mel_before)[0, :, 0]
audio_after = melgan.inference(masked_mel_after)[0, :, 0]

# save to file
sf.write('./audio_before.wav', audio_before, 22050, "PCM_16")
sf.write('./audio_after.wav', audio_after, 22050, "PCM_16")

I changed only

# initialize fastspeech model.
fs_config = AutoConfig.from_pretrained('E:/Python/tts/TensorflowTTS-Mode/examples/fastspeech/conf/fastspeech.v1.yaml')
fastspeech = TFAutoModel.from_pretrained(
    config=fs_config,
    pretrained_path="E:/Python/tts/TensorflowTTS-Mode/examples/fastspeech v1/checkpoints/model-195000.h5"
)

using the checkpoint from https://drive.google.com/open?id=1f69ujszFeGnIy7PMwc8AkUckhIaT2OD0

I got

Traceback (most recent call last):
  File "t.py", line 15, in <module>
    pretrained_path="E:/Python/tts/TensorflowTTS-Mode/examples/fastspeech v1/checkpoints/model-195000.h5"
  File "E:\Benutzerdaten\Python\TensorflowTTS-Mode\tensorflow_inference\auto_model.py", line 69, in from_pretrained
    model.load_weights(pretrained_path)
  File "E:\Anaconda\envs\myEnv\lib\site-packages\tensorflow\python\keras\engine\training.py", line 2211, in load_weights
    hdf5_format.load_weights_from_hdf5_group(f, self.layers)
  File "E:\Anaconda\envs\myEnv\lib\site-packages\tensorflow\python\keras\saving\hdf5_format.py", line 708, in load_weights_from_hdf5_group
    K.batch_set_value(weight_value_tuples)
  File "E:\Anaconda\envs\myEnv\lib\site-packages\tensorflow\python\util\dispatch.py", line 201, in wrapper
    return target(*args, **kwargs)
  File "E:\Anaconda\envs\myEnv\lib\site-packages\tensorflow\python\keras\backend.py", line 3576, in batch_set_value
    x.assign(np.asarray(value, dtype=dtype(x)))
  File "E:\Anaconda\envs\myEnv\lib\site-packages\tensorflow\python\ops\resource_variable_ops.py", line 858, in assign
    self._shape.assert_is_compatible_with(value_tensor.shape)
  File "E:\Anaconda\envs\myEnv\lib\site-packages\tensorflow\python\framework\tensor_shape.py", line 1134, in assert_is_compatible_with
    raise ValueError("Shapes %s and %s are incompatible" % (self, other))
ValueError: Shapes (199, 384) and (149, 384) are incompatible

How do I solve this?

dathudeptrai commented 4 years ago

@ErfolgreichCharismatisch can you try fastspeech2 rather than fastspeech1 ?, (pretrained here https://drive.google.com/drive/u/1/folders/1Q7QrTMksI-5F3_44-68ex2RkOO_789TW).

I just tested the above code in the master branch and everything is ok :D

ErfolgreichCharismatisch commented 4 years ago

Thanks.

I got that to work with the tutorials' files.

Now I get

2020-10-21 00:11:34.144199: W tensorflow/stream_executor/gpu/redzone_allocator.cc:314] Internal: Invoking GPU asm compilation is supported on Cuda non-Windows platforms only
Relying on driver to perform ptx compilation.
Modify $PATH to customize ptxas location.
This message will be only logged once.
Traceback (most recent call last):
  File "E:\Anaconda\envs\myenv\lib\site-packages\tensorflow\python\eager\function.py", line 2688, in _convert_inputs_to_signature
    check_types=False)  # lists are convert to tuples for `tf.data`.
  File "E:\Anaconda\envs\myenv\lib\site-packages\tensorflow\python\util\nest.py", line 952, in flatten_up_to
    expand_composites=expand_composites)
  File "E:\Anaconda\envs\myenv\lib\site-packages\tensorflow\python\util\nest.py", line 854, in assert_shallow_structure
    input_length=len(input_tree), shallow_length=len(shallow_tree)))
ValueError: The two structures don't have the same sequence length. Input structure has length 3, while shallow structure has length 5.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "t.py", line 37, in <module>
    speed_ratios=tf.constant([1.0], dtype=tf.float32)
  File "E:\Anaconda\envs\myenv\lib\site-packages\tensorflow\python\eager\def_function.py", line 780, in __call__
    result = self._call(*args, **kwds)
  File "E:\Anaconda\envs\myenv\lib\site-packages\tensorflow\python\eager\def_function.py", line 844, in _call
    *args, **kwds)
  File "E:\Anaconda\envs\myenv\lib\site-packages\tensorflow\python\eager\function.py", line 2622, in canonicalize_function_inputs
    self._flat_input_signature)
  File "E:\Anaconda\envs\myenv\lib\site-packages\tensorflow\python\eager\function.py", line 2692, in _convert_inputs_to_signature
    format_error_message(inputs, input_signature))
ValueError: Structure of Python function inputs does not match input_signature:
  inputs: (
    tf.Tensor(
[[ 55  42  40  42  51  57  11  55  42  56  42  38  55  40  45  11  38  57
   11  45  38  55  59  38  55  41  11  45  38  56  11  56  45  52  60  51
   11  50  42  41  46  57  38  57  46  51  44  11  43  52  55  11  38  56
   11  49  46  57  57  49  42  11  38  56  11  42  46  44  45  57  11  60
   42  42  48  56   6  11  40  38  51  11  38  40  57  58  38  49  49  62
   11  46  51  40  55  42  38  56  42  11  57  45  42  11  44  55  42  62
   11  50  38  57  57  42  55  11  46  51  11  57  45  42  11  53  38  55
   57  56  11  52  43  11  57  45  42  11  39  55  38  46  51  11  55  42
   56  53  52  51  56  46  39  49  42  11  43  52  55  11  42  50  52  57
   46  52  51  38  49  11  55  42  44  58  49  38  57  46  52  51   6  11
   38  51  41  11  49  42  38  55  51  46  51  44   7 148]], shape=(1, 194), dtype=int32),
    tf.Tensor([0], shape=(1,), dtype=int32),
    tf.Tensor([1.], shape=(1,), dtype=float32))
  input_signature: (
    TensorSpec(shape=(None, None), dtype=tf.int32, name='input_ids'),
    TensorSpec(shape=(None,), dtype=tf.int32, name='speaker_ids'),
    TensorSpec(shape=(None,), dtype=tf.float32, name='speed_ratios'),
    TensorSpec(shape=(None,), dtype=tf.float32, name='f0_ratios'),
    TensorSpec(shape=(None,), dtype=tf.float32, name='energy_ratios'))

Ideas?

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.