ceccocats / tkDNN

Deep neural network library and toolkit to do high performace inference on NVIDIA jetson platforms
GNU General Public License v2.0
718 stars 208 forks source link

Large GPU memory consumption on Tensorrt8 branch #279

Open zhou-git opened 2 years ago

zhou-git commented 2 years ago

I was been using tkdnn with tensort6/cuda10.0 for a while and everything works fine. Recently I upgrade the gpu card from 2070 to A4000, so a upgrade of all related drivers is necessary. Now the new environment is cuda11.5/tensorrt8.2.2.1/opencv4.5.5. Now with the new environment, using the same trained model (Yolov4, network size: 530x320), the gpu memory usage increase from roughly 1GB to 2.5GB with fp32, 600MB to 1.9GB with fp16.

Any ideas why this is happening? Thanks a lot for the great work by the way!