FastSpeech2 does not convert to ONNX

TensorSpeech / TensorFlowTTS

:stuck_out_tongue_closed_eyes: TensorFlowTTS: Real-Time State-of-the-art Speech Synthesis for Tensorflow 2 (supported including English, French, Korean, Chinese, German and Easy to adapt for other languages)

Apache License 2.0

3.8k stars 810 forks source link

I am trying to convert FastSpeech2 to ONNX with tf2onnx and when I run the model I get an error with an unsqueeze layer - Does anyone have insight on this?

Convert FastSpeech2 Keras -> Tensorflow

import numpy as np
import tensorflow as tf

from tensorflow_tts.inference import AutoConfig
from tensorflow_tts.inference import TFAutoModel
from tensorflow_tts.inference import AutoProcessor

processor = AutoProcessor.from_pretrained(
    pretrained_path="tensorflow_tts/processor/pretrained/ljspeech_mapper.json"
)
input_text = "hello world."
input_ids = processor.text_to_sequence(input_text)

config = AutoConfig.from_pretrained("examples/fastspeech2/conf/fastspeech2.v1.yaml")
fastspeech2 = TFAutoModel.from_pretrained(
    config=config,
    pretrained_path=None,
    is_build=True,
    name="fastspeech2"
)
fastspeech2.load_weights("models/model-150000.h5")

mel_before, mel_after, duration_outputs, _, _ = fastspeech2.inference(
    input_ids=tf.expand_dims(tf.convert_to_tensor(input_ids, dtype=tf.int32), 0),
    speaker_ids=tf.convert_to_tensor([0], dtype=tf.int32),
    speed_ratios=tf.convert_to_tensor([1.0], dtype=tf.float32),
    f0_ratios=tf.convert_to_tensor([1.0], dtype=tf.float32),
    energy_ratios=tf.convert_to_tensor([1.0], dtype=tf.float32),
)

# save model into pb and do inference. Note that signatures should be a tf.function with input_signatures.
tf.saved_model.save(fastspeech2, "./fastspeech2_tf_model", signatures=fastspeech2.inference)

Tensorflow to ONNX (Tried to explicitly set input shapes, didn't seem to matter. Also varied opset)

python -m tf2onnx.convert \
    --saved-model test_saved \
    --output test_saved/model.onnx \
    --opset 13\
    --inputs speed_ratios:0[1],speaker_ids:0[1],input_ids:0[1,-1],f0_ratios:0[1],energy_ratios:0[1]

Running in onnxruntime

import numpy
import onnxruntime as rt
import tensorflow as tf

from tensorflow_tts.inference import AutoProcessor

# Inputs
input_text = "Hello world."
speaker_ids = numpy.asarray((0)).reshape((1))
speed_ratios = numpy.asarray((1.0)).reshape((1))
f0_ratios = numpy.asarray((1.0)).reshape((1))
energy_ratios = numpy.asarray((1.0)).reshape((1))

# Pre-process inpiuts
processor = AutoProcessor.from_pretrained(
    pretrained_path="tensorflow_tts/processor/pretrained/ljspeech_mapper.json"
)
input_ids = numpy.asarray(processor.text_to_sequence(input_text), dtype=numpy.int32)
input_ids = input_ids.reshape((1, len(input_ids)))

# Load model
sess_options = rt.SessionOptions()
sess_options.graph_optimization_level = rt.GraphOptimizationLevel.ORT_ENABLE_ALL
sess = rt.InferenceSession("test_saved/model.onnx",
                           providers=rt.get_available_providers(),
                           sess_options=sess_options)

# Print model inputs
print(tf.expand_dims(tf.convert_to_tensor(processor.text_to_sequence(input_text), dtype=tf.int32), 0).shape)
print("\nNum inputs:", len(sess.get_inputs()))
for _input in sess.get_inputs():
    print("\t", _input.name, _input.type, _input.shape)
print("")

print(input_ids.shape)
print(speed_ratios.shape)

# Run model
pred_onx = sess.run(None, {
    "input_ids:0": input_ids,
    "speed_ratios:0": speed_ratios.astype(numpy.float32),
    "speaker_ids:0": speaker_ids.astype(numpy.int32),
    "energy_ratios:0": energy_ratios.astype(numpy.float32),
    "f0_ratios:0": f0_ratios.astype(numpy.float32),
})
print(pred_onx)

The error I get is:

2021-02-18 15:41:31.878003898 [E:onnxruntime:, sequential_executor.cc:333 Execute] Non-zero status code returned while running Unsqueeze node. Name:'Unsqueeze__892' Status Message: /onnxruntime_src/onnxruntime/core/providers/common.h:18 int64_t onnxruntime::HandleNegativeAxis(int64_t, int64_t) axis >= -tensor_rank && axis <= tensor_rank - 1 was false. axis 2 is not in valid range [-2,1]

Traceback (most recent call last):
  File "run_onnx.py", line 46, in <module>
    "f0_ratios:0": f0_ratios.astype(numpy.float32),
  File "/root/git/TensorFlowTTS/env/lib64/python3.6/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 124, in run
    return self._sess.run(output_names, input_feed, run_options)
onnxruntime.capi.onnxruntime_pybind11_state.RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Non-zero status code returned while running Unsqueeze node. Name:'Unsqueeze__892
' Status Message: /onnxruntime_src/onnxruntime/core/providers/common.h:18 int64_t onnxruntime::HandleNegativeAxis(int64_t, int64_t) axis >= -tensor_rank && axis <= tensor_rank - 1 was false. axis 2 is not in valid range [-2,1]

When inspecting the model in netron, here is that unsqueeze step to show where it is in the model for reference (red highlight, bottom right)

import numpy as np import keras2onnx import onnxruntime import tensorflow as tf from tensorflow_tts.inference import AutoConfig from tensorflow_tts.inference import TFAutoModel from tensorflow_tts.inference import AutoProcessor processor = AutoProcessor.from_pretrained( pretrained_path="tensorflow_tts/processor/pretrained/ljspeech_mapper.json" ) input_text = "hello world." input_ids = processor.text_to_sequence(input_text) config = AutoConfig.from_pretrained("examples/fastspeech2/conf/fastspeech2.v1.yaml") fastspeech2 = TFAutoModel.from_pretrained( config=config, pretrained_path="models/model-150000.h5", is_build=True, name="fastspeech2" ) # fastspeech2.load_weights("models/model-150000.h5") mel_before, mel_after, duration_outputs, _, _ = fastspeech2.inference( input_ids=tf.expand_dims(tf.convert_to_tensor(input_ids, dtype=tf.int32), 0), speaker_ids=tf.convert_to_tensor([0], dtype=tf.int32), speed_ratios=tf.convert_to_tensor([1.0], dtype=tf.float32), f0_ratios=tf.convert_to_tensor([1.0], dtype=tf.float32), energy_ratios=tf.convert_to_tensor([1.0], dtype=tf.float32), ) # convert to onnx model onnx_model = keras2onnx.convert_keras(fastspeech2, fastspeech2.name, target_opset=11) temp_model_file = 'keras_model.onnx' keras2onnx.save_model(onnx_model, temp_model_file)

... 021-02-18 16:05:14.346891: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:818] function_optimizer: Graph size after: 2662 nodes (181), 3168 edges (282), time = 29.482ms. 2021-02-18 16:05:14.346910: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:818] function_optimizer: function_optimizer did nothing. time = 1.485ms. WARN: No corresponding ONNX op matches the tf.op node fastspeech2/length_regulator/while/exit/_36 of type Exit The generated ONNX model needs run with the custom op supports. WARN: No corresponding ONNX op matches the tf.op node fastspeech2/length_regulator/while/fastspeech2/length_regulator/zeros_1_switch/_26 of type Switch The generated ONNX model needs run with the custom op supports. WARN: No corresponding ONNX op matches the tf.op node fastspeech2/length_regulator/while/merge/_16 of type Merge The generated ONNX model needs run with the custom op supports. WARN: No corresponding ONNX op matches the tf.op node fastspeech2/length_regulator/while/LoopCond/_20 of type LoopCond The generated ONNX model needs run with the custom op supports. WARN: No corresponding ONNX op matches the tf.op node fastspeech2/length_regulator/while/enter/_7 of type Enter The generated ONNX model needs run with the custom op supports. WARN: No corresponding ONNX op matches the tf.op node fastspeech2/length_regulator/while/next_iteration/_46 of type NextIteration The generated ONNX model needs run with the custom op supports. WARN: No corresponding ONNX op matches the tf.op node fastspeech2/length_regulator/while/body/_1/fastspeech2/length_regulator/while/Repeat/BroadcastTo of type BroadcastTo The generated ONNX model needs run with the custom op supports. Traceback (most recent call last): File "fastspeech2_to_keras.py", line 38, in <module> onnx_model = keras2onnx.convert_keras(fastspeech2, fastspeech2.name, target_opset=11) File "/root/git/TensorFlowTTS/env/lib64/python3.6/site-packages/keras2onnx/main.py", line 83, in convert_keras return convert_topology(topology, name, doc_string, target_opset, channel_first_inputs) File "/root/git/TensorFlowTTS/env/lib64/python3.6/site-packages/keras2onnx/topology.py", line 322, in convert_topology cvt(scope, operator, container) File "/root/git/TensorFlowTTS/env/lib64/python3.6/site-packages/keras2onnx/_builtin.py", line 690, in convert_tf_expand_dims rank = len(_cal_tensor_shape(node.inputs[0])) File "/root/git/TensorFlowTTS/env/lib64/python3.6/site-packages/keras2onnx/_tf_utils.py", line 67, in cal_tensor_shape if len(tensor.shape) > 0 and hasattr(tensor.shape[0], 'value'): File "/root/git/TensorFlowTTS/env/lib64/python3.6/site-packages/tensorflow/python/framework/tensor_shape.py", line 846, in __len__ raise ValueError("Cannot take the length of shape with unknown rank.") ValueError: Cannot take the length of shape with unknown rank.

TensorSpeech / TensorFlowTTS

FastSpeech2 does not convert to ONNX #501