Open NikitaKononov opened 1 year ago
They're just some normal warnings, it's OK to ignore them.
They're just some normal warnings
Thank you for your answer. So it doesn't affect inference quality?
Yes, none of these warnings have effects on quality.
The code reported by TracerWarnings is used to check whether variables meet the requirements, rather than the inference part. Therefore, it can be ignored.
Constant folding is a method used in ONNX to optimize the Slice operation. However, this optimization is not applicable at the location where the warning occurred, and not performing constant folding meets our expectations.
As for the last warning, I think it is some kind of bug in PyTorch that causes ONNX to be unable to recognize the type. However, it won't have any effect if your model infers correctly.
won't have any effect if your model infers correctly
Hello!
Model converted into onnx with your scripts has very poor performance in NVIDIA Triton Inference Server Inference time is x2-3 times slower, than pytorch inference I've tried all available options in Triton configuration, but I can't achieve good inference speed
Have you faced such problem? Thanks.
Very sorry! When writing the Onnx runtime, I specified the CPU inference, please wait
https://github.com/LlmKira/VitsServer/blob/c11475105127609ce7a1b8cc62e42ce70982ba9e/event.py#L157
I specified the CPU inference
Thanks, I'll give it a try But RunONNX doesn't affect converted model saving, as I can see in the code?
I use the converted model in NVIDIA Triton Inference server It utilizes GPU, but have poor performance for some reason
I'll test pure pytorch inference, pure onnx inference, and triton inference with pytorch model and onnx model and provide test results
ok
pls wait a while for svc branch
ok
Have done 50 test inferences for each model with same input text pytorch avg ~2.5s onnx avg ~ 2.7s triton onnx avg ~ 4.1s
for some reason onnxruntime in triton makes execution slower, trying to find bottleneck
ok
Have done 50 test inferences for each model with same input text pytorch avg ~2.5s onnx avg ~ 2.7s triton onnx avg ~ 4.1s
for some reason onnxruntime in triton makes execution slower, trying to find bottleneck
The server will convert pth to onnx before loading. Instead of using onnx. It may be that the model structure or other configuration errors caused this problem
The ONNX model will perform some initialization operations during the first reasoning after the Session is loaded, and this factor should also be considered
Hello, I faced these errors while converting to onnx
TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! assert (discriminant >= 0).all() Warning: Constant folding - Only steps=1 can be constant folded for opset >= 10 onnx::Slice op. Constant folding not applied. WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
What may be wrong? Thanks