TensorSpeech / TensorFlowTTS

:stuck_out_tongue_closed_eyes: TensorFlowTTS: Real-Time State-of-the-art Speech Synthesis for Tensorflow 2 (supported including English, French, Korean, Chinese, German and Easy to adapt for other languages)
https://tensorspeech.github.io/TensorFlowTTS/
Apache License 2.0
3.8k stars 810 forks source link

Error using official demo test #736

Closed Jzow closed 2 years ago

Jzow commented 2 years ago
  1. Here is my code
import numpy as np
import soundfile as sf
import yaml

import tensorflow as tf

from tensorflow_tts.inference import TFAutoModel
from tensorflow_tts.inference import AutoProcessor

# initialize fastspeech2 model.
fastspeech2 = TFAutoModel.from_pretrained("tensorspeech/tts-fastspeech2-ljspeech-en")

# initialize mb_melgan model
mb_melgan = TFAutoModel.from_pretrained("tensorspeech/tts-mb_melgan-ljspeech-en")

# inference
processor = AutoProcessor.from_pretrained("tensorspeech/tts-fastspeech2-ljspeech-en")

input_ids = processor.text_to_sequence("A child's education has never been about learning information and basic skills only. "
                                       "It has always included teaching the next generation how to be good members of society. "
                                       "Therefore, this cannot be the responsibility of the parents alone. "
                                       "In order to be a good member of any society the individual must respect and obey the rules of their community and share their values. "
                                       "Educating children to understand the need to obey rules and respect others always begins in the home and is widely thought to be the responsibility of parents.")
print(input_ids)
# fastspeech inference

mel_before, mel_after, duration_outputs, _, _ = fastspeech2.inference(
    input_ids=tf.expand_dims(tf.convert_to_tensor(input_ids, dtype=tf.int32), 0),
    speaker_ids=tf.convert_to_tensor([0], dtype=tf.int32),
    speed_ratios=tf.convert_to_tensor([1.0], dtype=tf.float32),
    f0_ratios =tf.convert_to_tensor([1.0], dtype=tf.float32),
    energy_ratios =tf.convert_to_tensor([1.0], dtype=tf.float32),
)

# melgan inference
audio_before = mb_melgan.inference(mel_before)[0, :, 0]
audio_after = mb_melgan.inference(mel_after)[0, :, 0]

# save to file
sf.write('./test5.wav', audio_before, 99999, "PCM_16")
sf.write('./test6.wav', audio_after, 99999, "PCM_16")
  1. This is the text I will enter. This is an IELTS article:
    
    A child's education has never been about learning information and basic skills only. 
    It has always included teaching the next generation how to be good members of society. Therefore, this cannot be the responsibility of the parents alone.
    In order to be a good member of any society the individual must respect and obey the rules of their community and share their values. 

Educating children to understand the need to obey rules and respect others always begins in the home and is widely thought to be the responsibility of parents. They will certainly be the first to help children learn what is important in life, how they are expected to behave and what role they will play in the world.

However, learning to understand and share the value system of a whole society cannot be achieved just in the home. Once a child goes to school, they are entering a wider community where teachers and peers will have just as much influence as their parents do at home.

At school, children will experience working and living with people from a whole variety of backgrounds from the wider society. This experience should teach them how to co-operate with each other and how to contribute to the life of their community.

But to be a valuable member of any community is not like learning a simple skill. It is something that an individual goes on learning throughout life and it is the responsibility of every member of every member of a society to take responsibility for helping the younger generation to become active and able members of that society.


3. The error log is as follows:
```python
2022-01-18 11:05:18.886174: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cudart64_110.dll'; dlerror: cudart64_110.dll not found
2022-01-18 11:05:18.886293: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2022-01-18 11:05:25.943961: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'nvcuda.dll'; dlerror: nvcuda.dll not found
2022-01-18 11:05:25.944077: W tensorflow/stream_executor/cuda/cuda_driver.cc:269] failed call to cuInit: UNKNOWN ERROR (303)
2022-01-18 11:05:25.946044: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnostic information for host: James
2022-01-18 11:05:25.946221: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: James
2022-01-18 11:05:25.946457: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-01-18 11:05:32.304179: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:185] None of the MLIR Optimization Passes are enabled (registered 2)
[38, 11, 40, 45, 46, 49, 41, 3, 56, 11, 42, 41, 58, 40, 38, 57, 46, 52, 51, 11, 45, 38, 56, 11, 51, 42, 59, 42, 55, 11, 39, 42, 42, 51, 11, 38, 39, 52, 58, 57, 11, 49, 42, 38, 55, 51, 46, 51, 44, 11, 46, 51, 43, 52, 55, 50, 38, 57, 46, 52, 51, 11, 38, 51, 41, 11, 39, 38, 56, 46, 40, 11, 56, 48, 46, 49, 49, 56, 11, 52, 51, 49, 62, 7, 11, 46, 57, 11, 45, 38, 56, 11, 38, 49, 60, 38, 62, 56, 11, 46, 51, 40, 49, 58, 41, 42, 41, 11, 57, 42, 38, 40, 45, 46, 51, 44, 11, 57, 45, 42, 11, 51, 42, 61, 57, 11, 44, 42, 51, 42, 55, 38, 57, 46, 52, 51, 11, 45, 52, 60, 11, 57, 52, 11, 39, 42, 11, 44, 52, 52, 41, 11, 50, 42, 50, 39, 42, 55, 56, 11, 52, 43, 11, 56, 52, 40, 46, 42, 57, 62, 7, 11, 57, 45, 42, 55, 42, 43, 52, 55, 42, 6, 11, 57, 45, 46, 56, 11, 40, 38, 51, 51, 52, 57, 11, 39, 42, 11, 57, 45, 42, 11, 55, 42, 56, 53, 52, 51, 56, 46, 39, 46, 49, 46, 57, 62, 11, 52, 43, 11, 57, 45, 42, 11, 53, 38, 55, 42, 51, 57, 56, 11, 38, 49, 52, 51, 42, 7, 11, 46, 51, 11, 52, 55, 41, 42, 55, 11, 57, 52, 11, 39, 42, 11, 38, 11, 44, 52, 52, 41, 11, 50, 42, 50, 39, 42, 55, 11, 52, 43, 11, 38, 51, 62, 11, 56, 52, 40, 46, 42, 57, 62, 11, 57, 45, 42, 11, 46, 51, 41, 46, 59, 46, 41, 58, 38, 49, 11, 50, 58, 56, 57, 11, 55, 42, 56, 53, 42, 40, 57, 11, 38, 51, 41, 11, 52, 39, 42, 62, 11, 57, 45, 42, 11, 55, 58, 49, 42, 56, 11, 52, 43, 11, 57, 45, 42, 46, 55, 11, 40, 52, 50, 50, 58, 51, 46, 57, 62, 11, 38, 51, 41, 11, 56, 45, 38, 55, 42, 11, 57, 45, 42, 46, 55, 11, 59, 38, 49, 58, 42, 56, 7, 11, 42, 41, 58, 40, 38, 57, 46, 51, 44, 11, 40, 45, 46, 49, 41, 55, 42, 51, 11, 57, 52, 11, 58, 51, 41, 42, 55, 56, 57, 38, 51, 41, 11, 57, 45, 42, 11, 51, 42, 42, 41, 11, 57, 52, 11, 52, 39, 42, 62, 11, 55, 58, 49, 42, 56, 11, 38, 51, 41, 11, 55, 42, 56, 53, 42, 40, 57, 11, 52, 57, 45, 42, 55, 56, 11, 38, 49, 60, 38, 62, 56, 11, 39, 42, 44, 46, 51, 56, 11, 46, 51, 11, 57, 45, 42, 11, 45, 52, 50, 42, 11, 38, 51, 41, 11, 46, 56, 11, 60, 46, 41, 42, 49, 62, 11, 57, 45, 52, 58, 44, 45, 57, 11, 57, 52, 11, 39, 42, 11, 57, 45, 42, 11, 55, 42, 56, 53, 52, 51, 56, 46, 39, 46, 49, 46, 57, 62, 11, 52, 43, 11, 53, 38, 55, 42, 51, 57, 56, 7, 148]
Traceback (most recent call last):
  File "E:/iston_algorithm/util/speech/TextTest.py", line 29, in <module>
    mel_before, mel_after, duration_outputs, _, _ = fastspeech2.inference(
  File "C:\Users\Administrator\AppData\Local\Programs\Python\Python38\lib\site-packages\tensorflow\python\eager\def_function.py", line 885, in __call__
    result = self._call(*args, **kwds)
  File "C:\Users\Administrator\AppData\Local\Programs\Python\Python38\lib\site-packages\tensorflow\python\eager\def_function.py", line 956, in _call
    return self._concrete_stateful_fn._call_flat(
  File "C:\Users\Administrator\AppData\Local\Programs\Python\Python38\lib\site-packages\tensorflow\python\eager\function.py", line 1963, in _call_flat
    return self._build_call_outputs(self._inference_function.call(
  File "C:\Users\Administrator\AppData\Local\Programs\Python\Python38\lib\site-packages\tensorflow\python\eager\function.py", line 591, in call
    outputs = execute.execute(
  File "C:\Users\Administrator\AppData\Local\Programs\Python\Python38\lib\site-packages\tensorflow\python\eager\execute.py", line 59, in quick_execute
    tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.InvalidArgumentError:  indices[0,2139] = 2140 is not in [0, 2049)
     [[node decoder/position_embeddings/Gather (defined at C:\Users\Administrator\AppData\Local\Programs\Python\Python38\lib\site-packages\tensorflow_tts\models\fastspeech.py:76) ]] [Op:__inference__inference_10438]

Errors may have originated from an input operation.
Input Source operations connected to node decoder/position_embeddings/Gather:
 mul_3 (defined at C:\Users\Administrator\AppData\Local\Programs\Python\Python38\lib\site-packages\tensorflow_tts\models\fastspeech2.py:272)

Function call stack:
_inference

Process finished with exit code 1
Jzow commented 2 years ago

environment:

windows 11 python3.8

Jzow commented 2 years ago

When I changed to tts-tacotron2-ljspeech-en, The reasoning result is only half, and then it is not very complete. How should I configure it and how to do it? Here is my code

import soundfile as sf
import numpy as np

import tensorflow as tf

from tensorflow_tts.inference import AutoProcessor
from tensorflow_tts.inference import TFAutoModel

processor = AutoProcessor.from_pretrained("tensorspeech/tts-tacotron2-ljspeech-en")
tacotron2 = TFAutoModel.from_pretrained("tensorspeech/tts-tacotron2-ljspeech-en")
melgan = TFAutoModel.from_pretrained("tensorspeech/tts-melgan-ljspeech-en")

text = "A child's education has never been about learning information and basic skills only. It has always included teaching the next generation how to be good members of society. Therefore, this cannot be the responsibility of the parents alone. " \
       "In order to be a good member of any society the individual must respect and obey the rules of their community and share their values. " \
       "Educating children to understand the need to obey rules and respect others always begins in the home and is widely thought to be the responsibility of parents. " \
       "They will certainly be the first to help children learn what is important in life, how they are expected to behave and what role they will play in the world. " \
       "However, learning to understand and share the value system of a whole society cannot be achieved just in the home. " \
       "Once a child goes to school, they are entering a wider community where teachers and peers will have just as much influence as their parents do at home. " \
       "At school, children will experience working and living with people from a whole variety of backgrounds from the wider society. " \
       "This experience should teach them how to co-operate with each other and how to contribute to the life of their community. " \
       "But to be a valuable member of any community is not like learning a simple skill. " \
       "It is something that an individual goes on learning throughout life and it is the responsibility of every member of every member of a society to take responsibility for helping the younger generation to become active and able members of that society."

input_ids = processor.text_to_sequence(text)

# tacotron2 inference (text-to-mel)
decoder_output, mel_outputs, stop_token_prediction, alignment_history = tacotron2.inference(
    input_ids=tf.expand_dims(tf.convert_to_tensor(input_ids, dtype=tf.int32), 0),
    input_lengths=tf.convert_to_tensor([len(input_ids)], tf.int32),
    speaker_ids=tf.convert_to_tensor([0], dtype=tf.int32),
)

# melgan inference (mel-to-wav)
audio = melgan.inference(mel_outputs)[0, :, 0]

# save to file
sf.write('./audio.wav', audio, 22050, "PCM_16")