NVIDIA-AI-IOT / Lidar_AI_Solution

A project demonstrating Lidar related AI solutions, including three GPU accelerated Lidar/camera DL networks (PointPillars, CenterPoint, BEVFusion) and the related libs (cuPCL, 3D SparseConvolution, YUV2RGB, cuOSD,).
Other
1.23k stars 217 forks source link

Replication of BEVFusion PTQ Error #149

Open ihaohe opened 1 year ago

ihaohe commented 1 year ago

Thanks for your nice work! I've reproduced BEVFusion PTQ performace with the model(Resnet50) you provided and the script tools/test-mAP-for-cuda.py following https://github.com/NVIDIA-AI-IOT/Lidar_AI_Solution/blob/master/CUDA-BEVFusion/qat/README.md to regenerate Renset50-PTQ model.

But when I train my own BEVFusionModel-Resnet50 (Got 68.10mAP 71.13NDS like yours) and try to use PTQ ,the PTQ process done successfully,however,the nuScenes eval terminate just like: c75550bf1f5942485a20139dfc86dd3a

I printed the boxes info and found some boxes are "nan"! image

I suspect there is something wrong in the PTQ process,but I've no idea how to debug it,can you give me some suggestions?

liuanqi-libra7 commented 1 year ago

@ihaohe Could you please provide the version information for your CUDA and TensorRT?

ihaohe commented 1 year ago

@ihaohe Could you please provide the version information for your CUDA and TensorRT?

@liuanqi-libra7

CUDA 11.3 TensorRT-8.5.1.7 pytorch-quantization 2.1.2 torch 1.10.1+cu113 mmcv 1.4.0 mmdet 2.20.0

My BEVFusion model is here model . You can use it to reproduce the error I've met. Thanks for your help.

ihaohe commented 1 year ago

My BEVFusion model Int8 result onnx_int8. Hope it can help you to locate the problem.

ihaohe commented 1 year ago

@hopef @liuanqi-libra7 I'm sorry to bother you. Is there any progress about this issue?