NVIDIA-AI-IOT / torch2trt

An easy to use PyTorch to TensorRT converter
MIT License
4.6k stars 676 forks source link

In jetson nano, inference time of fp16 and fp_32 is almost same? #470

Open xxy90 opened 3 years ago

jaybdub commented 3 years ago

Hi xxy90,

Thanks for reaching out!

Do you mind sharing which model architecture you're referring to? The relative performance of FP32 vs. FP16 may depend on model architecture. I think the scaling also might not be linear with bit depth, because of various overhead when using reduced precision.

Best, John

zhangchenwei115 commented 3 years ago

Hi xxy90,

Thanks for reaching out!

Do you mind sharing which model architecture you're referring to? The relative performance of FP32 vs. FP16 may depend on model architecture. I think the scaling also might not be linear with bit depth, because of various overhead when using reduced precision.

Best, John

Hi John, I had the same problem which on Jetson Nano, weather in converting,fp16=True or False. the speed is the same. I used the model of lightweight openpose,here is the link, https://github.com/Daniil-Osokin/lightweight-human-pose-estimation.pytorch

xxy90 commented 3 years ago

The model architecture is new anchor-free object-detection Nanodet, whose backbone is shuffleNetv2

shadowuyl commented 3 years ago

I also meet the same problem while using YOLOX-Nano(ref https://github.com/Megvii-BaseDetection/YOLOX) on jetson nano.