NVIDIA-AI-IOT / torch2trt

An easy to use PyTorch to TensorRT converter
MIT License
4.6k stars 675 forks source link

Performance degeneration between commit 230~280 #104

Closed jtlee90 closed 2 years ago

jtlee90 commented 5 years ago

Seems there is performance degeneration, and I didn't figure out where it happens.

but definitely inference speed drops dramatically after pull recent commit.

Do you have any concern about this?

and... maybe you already notice that memory usage is increased(it is negligible) after few commits.

jaybdub commented 5 years ago

Hi jtlee90,

I hadn’t noticed a performance drop in our tests. Are you able to share the model that you’re using?

Best, John

jaybdub commented 5 years ago

Hi Jtlee90,

I am yet unable to find a commit to torch2trt that causes a performance degredation. But I did notice that updating torchvision from v0.3.0 to v0.4.0, the performance of densenet drops significantly.

For your model, did the implementation change at all? Or the only change was updating torch2trt?

Best, John

jtlee90 commented 5 years ago

Hi @jaybdub,

I was working with detection model, which I implemented myself. Basically, It's similar with SSD architecture and torchvision doesn't have this model.

I updated only torch2trt and nothing changed. Main degeneration was inference speed. with TensorRT, my model shows almost 120FPS and after updating torch2trt become about 50FPS but accuracy was same.

Yesterday, I had a vacation so that didn't have much time to check where it happens.

jtlee90 commented 5 years ago

@jaybdub Hi again, There is good news for you.

I found where it happens. Exactly this commit 599524b625f1ae75139dc81017bf7f88b6bbbf83 makes model slow. Especially, when you including batch broadcasting.

https://github.com/NVIDIA-AI-IOT/torch2trt/commit/599524b625f1ae75139dc81017bf7f88b6bbbf83#diff-49b77ee88cfa5a65319341980068d292L112-R115

https://github.com/NVIDIA-AI-IOT/torch2trt/commit/599524b625f1ae75139dc81017bf7f88b6bbbf83#diff-49b77ee88cfa5a65319341980068d292L129-R133

For inference, I used batch size 1 so I accuracy is not effected from this. Anyway, it should not drop speed and I can't not found any logic which...make speed drops...hm

Do you have any other idea?