MasayaKawamura / MB-iSTFT-VITS

Lightweight and High-Fidelity End-to-End Text-to-Speech with Multi-Band Generation and Inverse Short-Time Fourier Transform
Apache License 2.0
417 stars 64 forks source link

onnx exported problem #8

Open unparalleled-ysj opened 1 year ago

unparalleled-ysj commented 1 year ago

May I ask how you dealt with the “RuntimeError: Unknown number type: complex” problem caused by torch.istft when exporting the onnx model

Jackiexiao commented 1 year ago

torch.istft is currently not support to convert to onnx and still in development, see: https://github.com/pytorch/pytorch/issues/81075

MasayaKawamura commented 1 year ago

Hi @unparalleled-ysj, I'm sorry to be late... torch.istft is currently not supported by onnx, so please exclude only the istft part when exporting to onnx. @Jackiexiao, thank you for your comment!

MasayaKawamura commented 1 year ago

Maybe this URL will also be useful for torch onnx.

Jackiexiao commented 1 year ago

I'm confused, we have to pass istft function to get wav, if we exclude istft part, we can't get result @MasayaKawamura

MasayaKawamura commented 1 year ago

@Jackiexiao I think that it is possible to export the processes except for istft using onnx. During inference, I think that wav can be obtained by combining onnx and torch istft code.

Jackiexiao commented 1 year ago

ok, I get, looking forward to get istft support in torch nightly, so we just need onnx during inference

unparalleled-ysj commented 1 year ago

You can use https://github.com/MasayaKawamura/MB-iSTFT-VITS/blob/df2f8d3063f83c22e04d2c0066fa2129d26da9a1/stft.py#L144 instead of https://github.com/MasayaKawamura/MB-iSTFT-VITS/blob/df2f8d3063f83c22e04d2c0066fa2129d26da9a1/stft.py#L197 when exporting, which can successfully export the model as onnx, but at the same time, there will be rattling noise in the speech. After my ablation comparison, the problem still appears in the istft export (because the original model has no problem)

unparalleled-ysj commented 1 year ago

@Jackiexiao @MasayaKawamura refer to https://github.com/pytorch/pytorch/issues/31317#issue-538490027

Jackiexiao commented 1 year ago

thx

FanhuaandLuomu commented 1 year ago

Hi @MasayaKawamura Can you share your code to save onnx model, i got some problems when i convert to onnx.

Jackiexiao commented 1 year ago

FYI see: https://github.com/wenet-e2e/wetts/blob/main/wetts/vits/export_onnx.py but you can't export istft vocoder to onnx here @FanhuaandLuomu

abylouw commented 1 year ago

FYI see: https://github.com/wenet-e2e/wetts/blob/main/wetts/vits/export_onnx.py but you can't export istft vocoder to onnx here @FanhuaandLuomu

hi @Jackiexiao,

I have tried the above script to export but I have had no success. Would you mind sharing your export code?

abylouw commented 1 year ago

Hi @MasayaKawamura Can you share your code to save onnx model, i got some problems when i convert to onnx.

Hi @FanhuaandLuomu, have you succeeded in exporting the model?

Jackiexiao commented 1 year ago

@abylouw it just work in original vits(not for mbistft, but they work the same way, except the vocoder part), and wetts repo has all code you need

JohnHerry commented 1 year ago

You can use

https://github.com/MasayaKawamura/MB-iSTFT-VITS/blob/df2f8d3063f83c22e04d2c0066fa2129d26da9a1/stft.py#L144

instead of https://github.com/MasayaKawamura/MB-iSTFT-VITS/blob/df2f8d3063f83c22e04d2c0066fa2129d26da9a1/stft.py#L197

when exporting, which can successfully export the model as onnx, but at the same time, there will be rattling noise in the speech. After my ablation comparison, the problem still appears in the istft export (because the original model has no problem)

Do we need to use the class STFT instead of TorchSTFT during the training in this case?

JohnHerry commented 1 year ago

@Jackiexiao I think that it is possible to export the processes except for istft using onnx. During inference, I think that wav can be obtained by combining onnx and torch istft code.

I have tried to split the MB-iSTFT-VITS into this two parts and the former transfered into onnx, it is succeed. but as to the MS-iSTFT-VITS, I have to split the model into three parts, which case the first and the third part should be transfer into onnx models. as to the third part, the multi-band filter, the self.multistream_conv_post layer there is a weight_norm, should I keep the weight_norm layer there? I saw your remove_weight_norm function in the class did not remove this part. If the weight_norm can be removed during transfer into onnx, should I just put the "dec.multistream_conv_post.weight_v" value in the checkpoint , into my self defined third model part?

nshmyrev commented 12 months ago

Do we need to use the class STFT instead of TorchSTFT during the training in this case?

You do not have to use STFT during training, only during export. See here

https://github.com/alphacep/MB-iSTFT-VITS2/commit/29c91d478bc16e653cd5de7c9163e2ba45ed7c6c

see also

https://github.com/FENRlR/MB-iSTFT-VITS2/issues/3

Insensiblee commented 10 months ago

在这种情况下,我们在训练过程中是否需要使用 STFT 类来代替 TorchSTFT ?

您不必在训练期间使用 STFT,只需在导出期间使用。看这里

alphacep/MB-iSTFT-VITS2@ 29c91d4

也可以看看

FENRlR/MB-iSTFT-VITS2#3

I used this code to transfer onnx to process the pre-trained model provided, why did I report this error:AttributeError: 'ResidualCouplingLayer' object has no attribute 'remove_weight_norm'

JohnHerry commented 10 months ago

在这种情况下,我们在训练过程中是否需要使用 STFT 类来代替 TorchSTFT ?

您不必在训练期间使用 STFT,只需在导出期间使用。看这里 alphacep/MB-iSTFT-VITS2@ 29c91d4 也可以看看 FENRlR/MB-iSTFT-VITS2#3

I used this code to transfer onnx to process the pre-trained model provided, why did I report this error:AttributeError: 'ResidualCouplingLayer' object has no attribute 'remove_weight_norm'

No, the STFT from this project, which is the same with the one from the iSTFTNet project, is not good for onnx model exporation. it can help generate a onnx model for inference, but this model will failed for some input cases. and even for those success, the generated wavform will contains some noise.