coqui-ai / TTS

🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
http://coqui.ai
Mozilla Public License 2.0
35.44k stars 4.32k forks source link

[Bug] vctk-vits convert into ONNX error #3979

Open sofianhw opened 2 months ago

sofianhw commented 2 months ago

Describe the bug

I managed to fine-tuning vctk-vits using language indonesia. I wanna convert best_model.pth using vits.export_onnx

this is my config.json

but get some error:

/usr/local/lib/python3.10/dist-packages/TTS/tts/layers/vits/networks.py:86: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  assert x.shape[0] == x_lengths.shape[0]
/usr/local/lib/python3.10/dist-packages/TTS/tts/layers/glow_tts/transformer.py:133: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  assert t_s == t_t, "Relative attention is only available for self-attention."
/usr/local/lib/python3.10/dist-packages/TTS/tts/layers/glow_tts/transformer.py:199: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  pad_length = max(length - (self.rel_attn_window_size + 1), 0)
/usr/local/lib/python3.10/dist-packages/TTS/tts/layers/glow_tts/transformer.py:200: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  slice_start_position = max((self.rel_attn_window_size + 1) - length, 0)
/usr/local/lib/python3.10/dist-packages/TTS/tts/layers/glow_tts/transformer.py:202: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if pad_length > 0:
/usr/local/lib/python3.10/dist-packages/TTS/tts/layers/vits/transforms.py:111: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if torch.min(inputs) < left or torch.max(inputs) > right:
/usr/local/lib/python3.10/dist-packages/TTS/tts/layers/vits/transforms.py:116: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if min_bin_width * num_bins > 1.0:
/usr/local/lib/python3.10/dist-packages/TTS/tts/layers/vits/transforms.py:118: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if min_bin_height * num_bins > 1.0:
/usr/local/lib/python3.10/dist-packages/TTS/tts/layers/vits/transforms.py:168: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  assert (discriminant >= 0).all()
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[43], line 10
      8 vits = Vits.init_from_config(config)
      9 vits.load_checkpoint(config, "/app/vits-vctk-ms-ml-ds/traineroutput/vits_vctk-August-13-2024_10+45PM-0000000/best_model.pth")
---> 10 vits.export_onnx(output_path='model.onnx', verbose=True)

File /usr/local/lib/python3.10/dist-packages/TTS/tts/models/vits.py:1864, in Vits.export_onnx(self, output_path, verbose)
   1861     input_names.append("langid")
   1863 # export to ONNX
-> 1864 torch.onnx.export(
   1865     model=self,
   1866     args=dummy_input,
   1867     opset_version=15,
   1868     f=output_path,
   1869     verbose=verbose,
   1870     input_names=input_names,
   1871     output_names=["output"],
   1872     dynamic_axes={
   1873         "input": {0: "batch_size", 1: "phonemes"},
   1874         "input_lengths": {0: "batch_size"},
   1875         "output": {0: "batch_size", 1: "time1", 2: "time2"},
   1876     },
   1877 )
   1879 # rollback
   1880 self.forward = _forward

File /usr/local/lib/python3.10/dist-packages/torch/onnx/utils.py:516, in export(model, args, f, export_params, verbose, training, input_names, output_names, operator_export_type, opset_version, do_constant_folding, dynamic_axes, keep_initializers_as_inputs, custom_opsets, export_modules_as_functions, autograd_inlining)
    189 @_beartype.beartype
    190 def export(
    191     model: Union[torch.nn.Module, torch.jit.ScriptModule, torch.jit.ScriptFunction],
   (...)
    208     autograd_inlining: Optional[bool] = True,
    209 ) -> None:
    210     r"""Exports a model into ONNX format.
    211 
    212     If ``model`` is not a :class:`torch.jit.ScriptModule` nor a
   (...)
    513             All errors are subclasses of :class:`errors.OnnxExporterError`.
    514     """
--> 516     _export(
    517         model,
    518         args,
    519         f,
    520         export_params,
    521         verbose,
    522         training,
    523         input_names,
    524         output_names,
    525         operator_export_type=operator_export_type,
    526         opset_version=opset_version,
    527         do_constant_folding=do_constant_folding,
    528         dynamic_axes=dynamic_axes,
    529         keep_initializers_as_inputs=keep_initializers_as_inputs,
    530         custom_opsets=custom_opsets,
    531         export_modules_as_functions=export_modules_as_functions,
    532         autograd_inlining=autograd_inlining,
    533     )

File /usr/local/lib/python3.10/dist-packages/torch/onnx/utils.py:1613, in _export(model, args, f, export_params, verbose, training, input_names, output_names, operator_export_type, export_type, opset_version, do_constant_folding, dynamic_axes, keep_initializers_as_inputs, fixed_batch_size, custom_opsets, add_node_names, onnx_shape_inference, export_modules_as_functions, autograd_inlining)
   1610     dynamic_axes = {}
   1611 _validate_dynamic_axes(dynamic_axes, model, input_names, output_names)
-> 1613 graph, params_dict, torch_out = _model_to_graph(
   1614     model,
   1615     args,
   1616     verbose,
   1617     input_names,
   1618     output_names,
   1619     operator_export_type,
   1620     val_do_constant_folding,
   1621     fixed_batch_size=fixed_batch_size,
   1622     training=training,
   1623     dynamic_axes=dynamic_axes,
   1624 )
   1626 # TODO: Don't allocate a in-memory string for the protobuf
   1627 defer_weight_export = (
   1628     export_type is not _exporter_states.ExportTypes.PROTOBUF_FILE
   1629 )

File /usr/local/lib/python3.10/dist-packages/torch/onnx/utils.py:1135, in _model_to_graph(model, args, verbose, input_names, output_names, operator_export_type, do_constant_folding, _disable_torch_constant_prop, fixed_batch_size, training, dynamic_axes)
   1132     args = (args,)
   1134 model = _pre_trace_quant_model(model, args)
-> 1135 graph, params, torch_out, module = _create_jit_graph(model, args)
   1136 params_dict = _get_named_param_dict(graph, params)
   1138 try:

File /usr/local/lib/python3.10/dist-packages/torch/onnx/utils.py:1011, in _create_jit_graph(model, args)
   1006     graph = _C._propagate_and_assign_input_shapes(
   1007         graph, flattened_args, param_count_list, False, False
   1008     )
   1009     return graph, params, torch_out, None
-> 1011 graph, torch_out = _trace_and_get_graph_from_model(model, args)
   1012 _C._jit_pass_onnx_lint(graph)
   1013 state_dict = torch.jit._unique_state_dict(model)

File /usr/local/lib/python3.10/dist-packages/torch/onnx/utils.py:915, in _trace_and_get_graph_from_model(model, args)
    913 prev_autocast_cache_enabled = torch.is_autocast_cache_enabled()
    914 torch.set_autocast_cache_enabled(False)
--> 915 trace_graph, torch_out, inputs_states = torch.jit._get_trace_graph(
    916     model,
    917     args,
    918     strict=False,
    919     _force_outplace=False,
    920     _return_inputs_states=True,
    921 )
    922 torch.set_autocast_cache_enabled(prev_autocast_cache_enabled)
    924 warn_on_static_input_change(inputs_states)

File /usr/local/lib/python3.10/dist-packages/torch/jit/_trace.py:1296, in _get_trace_graph(f, args, kwargs, strict, _force_outplace, return_inputs, _return_inputs_states)
   1294 if not isinstance(args, tuple):
   1295     args = (args,)
-> 1296 outs = ONNXTracedModule(
   1297     f, strict, _force_outplace, return_inputs, _return_inputs_states
   1298 )(*args, **kwargs)
   1299 return outs

File /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1511, in Module._wrapped_call_impl(self, *args, **kwargs)
   1509     return self._compiled_call_impl(*args, **kwargs)  # type: ignore[misc]
   1510 else:
-> 1511     return self._call_impl(*args, **kwargs)

File /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1520, in Module._call_impl(self, *args, **kwargs)
   1515 # If we don't have any hooks, we want to skip the rest of the logic in
   1516 # this function, and just call forward.
   1517 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   1518         or _global_backward_pre_hooks or _global_backward_hooks
   1519         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1520     return forward_call(*args, **kwargs)
   1522 try:
   1523     result = None

File /usr/local/lib/python3.10/dist-packages/torch/jit/_trace.py:138, in ONNXTracedModule.forward(self, *args)
    135     else:
    136         return tuple(out_vars)
--> 138 graph, out = torch._C._create_graph_by_tracing(
    139     wrapper,
    140     in_vars + module_state,
    141     _create_interpreter_name_lookup_fn(),
    142     self.strict,
    143     self._force_outplace,
    144 )
    146 if self._return_inputs:
    147     return graph, outs[0], ret_inputs[0]

File /usr/local/lib/python3.10/dist-packages/torch/jit/_trace.py:129, in ONNXTracedModule.forward.<locals>.wrapper(*args)
    127 if self._return_inputs_states:
    128     inputs_states.append(_unflatten(in_args, in_desc))
--> 129 outs.append(self.inner(*trace_inputs))
    130 if self._return_inputs_states:
    131     inputs_states[0] = (inputs_states[0], trace_inputs)

File /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1511, in Module._wrapped_call_impl(self, *args, **kwargs)
   1509     return self._compiled_call_impl(*args, **kwargs)  # type: ignore[misc]
   1510 else:
-> 1511     return self._call_impl(*args, **kwargs)

File /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1520, in Module._call_impl(self, *args, **kwargs)
   1515 # If we don't have any hooks, we want to skip the rest of the logic in
   1516 # this function, and just call forward.
   1517 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   1518         or _global_backward_pre_hooks or _global_backward_hooks
   1519         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1520     return forward_call(*args, **kwargs)
   1522 try:
   1523     result = None

File /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1501, in Module._slow_forward(self, *input, **kwargs)
   1499         recording_scopes = False
   1500 try:
-> 1501     result = self.forward(*input, **kwargs)
   1502 finally:
   1503     if recording_scopes:

File /usr/local/lib/python3.10/dist-packages/TTS/tts/models/vits.py:1832, in Vits.export_onnx.<locals>.onnx_inference(text, text_lengths, scales, sid, langid)
   1830 self.length_scale = length_scale
   1831 self.noise_scale_dp = noise_scale_dp
-> 1832 return self.inference(
   1833     text,
   1834     aux_input={
   1835         "x_lengths": text_lengths,
   1836         "d_vectors": None,
   1837         "speaker_ids": sid,
   1838         "language_ids": langid,
   1839         "durations": None,
   1840     },
   1841 )["model_outputs"]

File /usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py:115, in context_decorator.<locals>.decorate_context(*args, **kwargs)
    112 @functools.wraps(func)
    113 def decorate_context(*args, **kwargs):
    114     with ctx_factory():
--> 115         return func(*args, **kwargs)

File /usr/local/lib/python3.10/dist-packages/TTS/tts/models/vits.py:1161, in Vits.inference(self, x, aux_input)
   1158 # upsampling if needed
   1159 z, _, _, y_mask = self.upsampling_z(z, y_lengths=y_lengths, y_mask=y_mask)
-> 1161 o = self.waveform_decoder((z * y_mask)[:, :, : self.max_inference_len], g=g)
   1163 outputs = {
   1164     "model_outputs": o,
   1165     "alignments": attn.squeeze(1),
   (...)
   1171     "y_mask": y_mask,
   1172 }
   1173 return outputs

File /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1511, in Module._wrapped_call_impl(self, *args, **kwargs)
   1509     return self._compiled_call_impl(*args, **kwargs)  # type: ignore[misc]
   1510 else:
-> 1511     return self._call_impl(*args, **kwargs)

File /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1520, in Module._call_impl(self, *args, **kwargs)
   1515 # If we don't have any hooks, we want to skip the rest of the logic in
   1516 # this function, and just call forward.
   1517 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   1518         or _global_backward_pre_hooks or _global_backward_hooks
   1519         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1520     return forward_call(*args, **kwargs)
   1522 try:
   1523     result = None

File /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1501, in Module._slow_forward(self, *input, **kwargs)
   1499         recording_scopes = False
   1500 try:
-> 1501     result = self.forward(*input, **kwargs)
   1502 finally:
   1503     if recording_scopes:

File /usr/local/lib/python3.10/dist-packages/TTS/vocoder/models/hifigan_generator.py:251, in HifiganGenerator.forward(self, x, g)
    249 o = self.conv_pre(x)
    250 if hasattr(self, "cond_layer"):
--> 251     o = o + self.cond_layer(g)
    252 for i in range(self.num_upsamples):
    253     o = F.leaky_relu(o, LRELU_SLOPE)

File /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1511, in Module._wrapped_call_impl(self, *args, **kwargs)
   1509     return self._compiled_call_impl(*args, **kwargs)  # type: ignore[misc]
   1510 else:
-> 1511     return self._call_impl(*args, **kwargs)

File /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1520, in Module._call_impl(self, *args, **kwargs)
   1515 # If we don't have any hooks, we want to skip the rest of the logic in
   1516 # this function, and just call forward.
   1517 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   1518         or _global_backward_pre_hooks or _global_backward_hooks
   1519         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1520     return forward_call(*args, **kwargs)
   1522 try:
   1523     result = None

File /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1501, in Module._slow_forward(self, *input, **kwargs)
   1499         recording_scopes = False
   1500 try:
-> 1501     result = self.forward(*input, **kwargs)
   1502 finally:
   1503     if recording_scopes:

File /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:310, in Conv1d.forward(self, input)
    309 def forward(self, input: Tensor) -> Tensor:
--> 310     return self._conv_forward(input, self.weight, self.bias)

File /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:306, in Conv1d._conv_forward(self, input, weight, bias)
    302 if self.padding_mode != 'zeros':
    303     return F.conv1d(F.pad(input, self._reversed_padding_repeated_twice, mode=self.padding_mode),
    304                     weight, bias, self.stride,
    305                     _single(0), self.dilation, self.groups)
--> 306 return F.conv1d(input, weight, bias, self.stride,
    307                 self.padding, self.dilation, self.groups)

TypeError: conv1d() received an invalid combination of arguments - got (NoneType, Parameter, Parameter, tuple, tuple, tuple, int), but expected one of:
 * (Tensor input, Tensor weight, Tensor bias, tuple of ints stride, tuple of ints padding, tuple of ints dilation, int groups)
      didn't match because some of the arguments have invalid types: (!NoneType!, !Parameter!, !Parameter!, !tuple of (int,)!, !tuple of (int,)!, !tuple of (int,)!, int)

To Reproduce

from TTS.tts.models.vits import Vits
from TTS.tts.configs.vits_config import VitsConfig

config = VitsConfig()
config.load_json("config.json")

# Initialize VITS model and load its checkpoint
vits = Vits.init_from_config(config)
vits.load_checkpoint(config, "]best_model.pth")
vits.export_onnx(output_path='model.onnx')

Expected behavior

model.onnx created

Logs

No response

Environment

{
    "CUDA": {
        "GPU": [],
        "available": false,
        "version": "12.1"
    },
    "Packages": {
        "PyTorch_debug": false,
        "PyTorch_version": "2.2.2+cu121",
        "TTS": "0.22.0",
        "numpy": "1.22.0"
    },
    "System": {
        "OS": "Linux",
        "architecture": [
            "64bit",
            "ELF"
        ],
        "processor": "x86_64",
        "python": "3.10.12",
        "version": "#24~22.04.1-Ubuntu SMP Thu Jul 18 10:43:12 UTC 2024"
    }
}

Additional context

No response

stale[bot] commented 6 days ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. You might also look our discussion channels.