LlmKira / VitsServer

🌻 VITS ONNX TTS server designed for fast inference 🔥
BSD 3-Clause "New" or "Revised" License
117 stars 6 forks source link

ONNX converting issues #10

Open NikitaKononov opened 1 year ago

NikitaKononov commented 1 year ago

Hello, I faced these errors while converting to onnx

TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! assert (discriminant >= 0).all() Warning: Constant folding - Only steps=1 can be constant folded for opset >= 10 onnx::Slice op. Constant folding not applied. WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.

What may be wrong? Thanks

lss233 commented 1 year ago

They're just some normal warnings, it's OK to ignore them.

NikitaKononov commented 1 year ago

They're just some normal warnings

Thank you for your answer. So it doesn't affect inference quality?

lss233 commented 1 year ago

Yes, none of these warnings have effects on quality.

The code reported by TracerWarnings is used to check whether variables meet the requirements, rather than the inference part. Therefore, it can be ignored.

Constant folding is a method used in ONNX to optimize the Slice operation. However, this optimization is not applicable at the location where the warning occurred, and not performing constant folding meets our expectations.

As for the last warning, I think it is some kind of bug in PyTorch that causes ONNX to be unable to recognize the type. However, it won't have any effect if your model infers correctly.

NikitaKononov commented 1 year ago

won't have any effect if your model infers correctly

Hello!

Model converted into onnx with your scripts has very poor performance in NVIDIA Triton Inference Server Inference time is x2-3 times slower, than pytorch inference I've tried all available options in Triton configuration, but I can't achieve good inference speed

Have you faced such problem? Thanks.

sudoskys commented 1 year ago

Very sorry! When writing the Onnx runtime, I specified the CPU inference, please wait

https://github.com/LlmKira/VitsServer/blob/c11475105127609ce7a1b8cc62e42ce70982ba9e/event.py#L157

sudoskys commented 1 year ago

https://github.com/LlmKira/VitsServer/pull/11/commits/954cebaf9a00f0458bc1fcb0ebc21191d48d6798

NikitaKononov commented 1 year ago

I specified the CPU inference

Thanks, I'll give it a try But RunONNX doesn't affect converted model saving, as I can see in the code?

I use the converted model in NVIDIA Triton Inference server It utilizes GPU, but have poor performance for some reason

I'll test pure pytorch inference, pure onnx inference, and triton inference with pytorch model and onnx model and provide test results

sudoskys commented 1 year ago

ok

sudoskys commented 1 year ago

pls wait a while for svc branch

NikitaKononov commented 1 year ago

ok

Have done 50 test inferences for each model with same input text pytorch avg ~2.5s onnx avg ~ 2.7s triton onnx avg ~ 4.1s

for some reason onnxruntime in triton makes execution slower, trying to find bottleneck

sudoskys commented 1 year ago

ok

Have done 50 test inferences for each model with same input text pytorch avg ~2.5s onnx avg ~ 2.7s triton onnx avg ~ 4.1s

for some reason onnxruntime in triton makes execution slower, trying to find bottleneck

The server will convert pth to onnx before loading. Instead of using onnx. It may be that the model structure or other configuration errors caused this problem

sudoskys commented 1 year ago

The ONNX model will perform some initialization operations during the first reasoning after the Session is loaded, and this factor should also be considered