No speed up! Why??(FP16/INT8 samely no speed up)

NVIDIA-AI-IOT / torch2trt

An easy to use PyTorch to TensorRT converter

MIT License

4.59k stars 675 forks source link

No speed up! Why??(FP16/INT8 samely no speed up) #577

Open dedoogong opened 3 years ago

dedoogong commented 3 years ago

I have converted R2D2 model from https://github.com/naver/r2d2/blob/master/extract.py using fp16_mode=True on T4(Tensor Core supported). But it shows almost no speed up(just 10~14%). The model is quite simple(maybe even simpler than Resnet-18 or similar as VGG-18)

The converting was so simple. just insert torch2trt module and call it with the network. Can you please help me how to get the speed up result?

jaybdub commented 3 years ago

Hi @dedoogong ,

Thanks for reaching out!

Do you mind sharing the code you used to convert and benchmark the model? This will help me reproduce what you see.

Best, John

samhithaaaa commented 2 years ago

I have converted a semantic segmentation model from fp32 to fp16 using .half() and I see only 10% speedup and inference time did not reduce by half.

Can someone help here?