Exported custom segformer model with pretrained weights ("nvidia/segformer-b0-finetuned-ade-512-512") to ONNX takes longer for inference compare to PyTorch! #149
second inference time with PyTorch on GPU RTX4090--> torch.Size([155, 3, 384, 384]):0.02 seconds
second inference time with ONNX with GPU RTX4090--> torch.Size([155, 3, 384, 384]) is --> 5.06 seconds
installed packages:
I have trained SegFormer model with pretrained weights as mentioned with classifier layer:
After training, I exported it to the onnx with:
Inference time output:
second inference time with PyTorch on GPU RTX4090--> torch.Size([155, 3, 384, 384]): 0.02 seconds second inference time with ONNX with GPU RTX4090--> torch.Size([155, 3, 384, 384]) is --> 5.06 seconds
I tried https://huggingface.co/docs/transformers/v4.19.0/serialization with a thought that my be this would short the inference time but couldn't succeed with tokenizer issue
error:
Is there any way to reduce the inference time for ONNX or any flag i'm missing in onnx export call? Any help would be appreciated. Thank you :hugs: