TensorSpeech / TensorFlowTTS

:stuck_out_tongue_closed_eyes: TensorFlowTTS: Real-Time State-of-the-art Speech Synthesis for Tensorflow 2 (supported including English, French, Korean, Chinese, German and Easy to adapt for other languages)
https://tensorspeech.github.io/TensorFlowTTS/
Apache License 2.0
3.8k stars 810 forks source link

FastSpeech2 does not convert to ONNX #501

Closed xDuck closed 3 years ago

xDuck commented 3 years ago

I am trying to convert FastSpeech2 to ONNX with tf2onnx and when I run the model I get an error with an unsqueeze layer - Does anyone have insight on this?

Convert FastSpeech2 Keras -> Tensorflow

import numpy as np
import tensorflow as tf

from tensorflow_tts.inference import AutoConfig
from tensorflow_tts.inference import TFAutoModel
from tensorflow_tts.inference import AutoProcessor

processor = AutoProcessor.from_pretrained(
    pretrained_path="tensorflow_tts/processor/pretrained/ljspeech_mapper.json"
)
input_text = "hello world."
input_ids = processor.text_to_sequence(input_text)

config = AutoConfig.from_pretrained("examples/fastspeech2/conf/fastspeech2.v1.yaml")
fastspeech2 = TFAutoModel.from_pretrained(
    config=config,
    pretrained_path=None,
    is_build=True,
    name="fastspeech2"
)
fastspeech2.load_weights("models/model-150000.h5")

mel_before, mel_after, duration_outputs, _, _ = fastspeech2.inference(
    input_ids=tf.expand_dims(tf.convert_to_tensor(input_ids, dtype=tf.int32), 0),
    speaker_ids=tf.convert_to_tensor([0], dtype=tf.int32),
    speed_ratios=tf.convert_to_tensor([1.0], dtype=tf.float32),
    f0_ratios=tf.convert_to_tensor([1.0], dtype=tf.float32),
    energy_ratios=tf.convert_to_tensor([1.0], dtype=tf.float32),
)

# save model into pb and do inference. Note that signatures should be a tf.function with input_signatures.
tf.saved_model.save(fastspeech2, "./fastspeech2_tf_model", signatures=fastspeech2.inference)

Tensorflow to ONNX (Tried to explicitly set input shapes, didn't seem to matter. Also varied opset)

python -m tf2onnx.convert \
    --saved-model test_saved \
    --output test_saved/model.onnx \
    --opset 13\
    --inputs speed_ratios:0[1],speaker_ids:0[1],input_ids:0[1,-1],f0_ratios:0[1],energy_ratios:0[1]

Running in onnxruntime

import numpy
import onnxruntime as rt
import tensorflow as tf

from tensorflow_tts.inference import AutoProcessor

# Inputs
input_text = "Hello world."
speaker_ids = numpy.asarray((0)).reshape((1))
speed_ratios = numpy.asarray((1.0)).reshape((1))
f0_ratios = numpy.asarray((1.0)).reshape((1))
energy_ratios = numpy.asarray((1.0)).reshape((1))

# Pre-process inpiuts
processor = AutoProcessor.from_pretrained(
    pretrained_path="tensorflow_tts/processor/pretrained/ljspeech_mapper.json"
)
input_ids = numpy.asarray(processor.text_to_sequence(input_text), dtype=numpy.int32)
input_ids = input_ids.reshape((1, len(input_ids)))

# Load model
sess_options = rt.SessionOptions()
sess_options.graph_optimization_level = rt.GraphOptimizationLevel.ORT_ENABLE_ALL
sess = rt.InferenceSession("test_saved/model.onnx",
                           providers=rt.get_available_providers(),
                           sess_options=sess_options)

# Print model inputs
print(tf.expand_dims(tf.convert_to_tensor(processor.text_to_sequence(input_text), dtype=tf.int32), 0).shape)
print("\nNum inputs:", len(sess.get_inputs()))
for _input in sess.get_inputs():
    print("\t", _input.name, _input.type, _input.shape)
print("")

print(input_ids.shape)
print(speed_ratios.shape)

# Run model
pred_onx = sess.run(None, {
    "input_ids:0": input_ids,
    "speed_ratios:0": speed_ratios.astype(numpy.float32),
    "speaker_ids:0": speaker_ids.astype(numpy.int32),
    "energy_ratios:0": energy_ratios.astype(numpy.float32),
    "f0_ratios:0": f0_ratios.astype(numpy.float32),
})
print(pred_onx)

The error I get is:

2021-02-18 15:41:31.878003898 [E:onnxruntime:, sequential_executor.cc:333 Execute] Non-zero status code returned while running Unsqueeze node. Name:'Unsqueeze__892' Status Message: /onnxruntime_src/onnxruntime/core/providers/common.h:18 int64_t onnxruntime::HandleNegativeAxis(int64_t, int64_t) axis >= -tensor_rank && axis <= tensor_rank - 1 was false. axis 2 is not in valid range [-2,1]

Traceback (most recent call last):
  File "run_onnx.py", line 46, in <module>
    "f0_ratios:0": f0_ratios.astype(numpy.float32),
  File "/root/git/TensorFlowTTS/env/lib64/python3.6/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 124, in run
    return self._sess.run(output_names, input_feed, run_options)
onnxruntime.capi.onnxruntime_pybind11_state.RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Non-zero status code returned while running Unsqueeze node. Name:'Unsqueeze__892
' Status Message: /onnxruntime_src/onnxruntime/core/providers/common.h:18 int64_t onnxruntime::HandleNegativeAxis(int64_t, int64_t) axis >= -tensor_rank && axis <= tensor_rank - 1 was false. axis 2 is not in valid range [-2,1]

When inspecting the model in netron, here is that unsqueeze step to show where it is in the model for reference (red highlight, bottom right)

Screen Shot 2021-02-18 at 3 43 26 PM
xDuck commented 3 years ago

I also tried keras2onnx but couldn't even get the model into ONNX format.

import numpy as np
import keras2onnx
import onnxruntime
import tensorflow as tf

from tensorflow_tts.inference import AutoConfig
from tensorflow_tts.inference import TFAutoModel
from tensorflow_tts.inference import AutoProcessor

processor = AutoProcessor.from_pretrained(
    pretrained_path="tensorflow_tts/processor/pretrained/ljspeech_mapper.json"
)
input_text = "hello world."
input_ids = processor.text_to_sequence(input_text)

config = AutoConfig.from_pretrained("examples/fastspeech2/conf/fastspeech2.v1.yaml")
fastspeech2 = TFAutoModel.from_pretrained(
    config=config,
    pretrained_path="models/model-150000.h5",
    is_build=True,
    name="fastspeech2"
)
# fastspeech2.load_weights("models/model-150000.h5")

mel_before, mel_after, duration_outputs, _, _ = fastspeech2.inference(
    input_ids=tf.expand_dims(tf.convert_to_tensor(input_ids, dtype=tf.int32), 0),
    speaker_ids=tf.convert_to_tensor([0], dtype=tf.int32),
    speed_ratios=tf.convert_to_tensor([1.0], dtype=tf.float32),
    f0_ratios=tf.convert_to_tensor([1.0], dtype=tf.float32),
    energy_ratios=tf.convert_to_tensor([1.0], dtype=tf.float32),
)

# convert to onnx model
onnx_model = keras2onnx.convert_keras(fastspeech2, fastspeech2.name, target_opset=11)
temp_model_file = 'keras_model.onnx'
keras2onnx.save_model(onnx_model, temp_model_file)

Errors:

...

021-02-18 16:05:14.346891: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:818]   function_optimizer: Graph size after: 2662 nodes (181), 3168 edges (282), time = 29.482ms.
2021-02-18 16:05:14.346910: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:818]   function_optimizer: function_optimizer did nothing. time = 1.485ms.
WARN: No corresponding ONNX op matches the tf.op node fastspeech2/length_regulator/while/exit/_36 of type Exit
      The generated ONNX model needs run with the custom op supports.
WARN: No corresponding ONNX op matches the tf.op node fastspeech2/length_regulator/while/fastspeech2/length_regulator/zeros_1_switch/_26 of type Switch
      The generated ONNX model needs run with the custom op supports.
WARN: No corresponding ONNX op matches the tf.op node fastspeech2/length_regulator/while/merge/_16 of type Merge
      The generated ONNX model needs run with the custom op supports.
WARN: No corresponding ONNX op matches the tf.op node fastspeech2/length_regulator/while/LoopCond/_20 of type LoopCond
      The generated ONNX model needs run with the custom op supports.
WARN: No corresponding ONNX op matches the tf.op node fastspeech2/length_regulator/while/enter/_7 of type Enter
      The generated ONNX model needs run with the custom op supports.
WARN: No corresponding ONNX op matches the tf.op node fastspeech2/length_regulator/while/next_iteration/_46 of type NextIteration
      The generated ONNX model needs run with the custom op supports.
WARN: No corresponding ONNX op matches the tf.op node fastspeech2/length_regulator/while/body/_1/fastspeech2/length_regulator/while/Repeat/BroadcastTo of type BroadcastTo
      The generated ONNX model needs run with the custom op supports.
Traceback (most recent call last):
  File "fastspeech2_to_keras.py", line 38, in <module>
    onnx_model = keras2onnx.convert_keras(fastspeech2, fastspeech2.name, target_opset=11)
  File "/root/git/TensorFlowTTS/env/lib64/python3.6/site-packages/keras2onnx/main.py", line 83, in convert_keras
    return convert_topology(topology, name, doc_string, target_opset, channel_first_inputs)
  File "/root/git/TensorFlowTTS/env/lib64/python3.6/site-packages/keras2onnx/topology.py", line 322, in convert_topology
    cvt(scope, operator, container)
  File "/root/git/TensorFlowTTS/env/lib64/python3.6/site-packages/keras2onnx/_builtin.py", line 690, in convert_tf_expand_dims
    rank = len(_cal_tensor_shape(node.inputs[0]))
  File "/root/git/TensorFlowTTS/env/lib64/python3.6/site-packages/keras2onnx/_tf_utils.py", line 67, in cal_tensor_shape
    if len(tensor.shape) > 0 and hasattr(tensor.shape[0], 'value'):
  File "/root/git/TensorFlowTTS/env/lib64/python3.6/site-packages/tensorflow/python/framework/tensor_shape.py", line 846, in __len__
    raise ValueError("Cannot take the length of shape with unknown rank.")
ValueError: Cannot take the length of shape with unknown rank.
xDuck commented 3 years ago

Building the model with the TFLite param enable_tflite_convertible=True seems to have done the trick, sorry for confusion