ceccocats / tkDNN

Deep neural network library and toolkit to do high performace inference on NVIDIA jetson platforms
GNU General Public License v2.0
718 stars 208 forks source link

Significant errors in confidence scores after TensorRT conversion #306

Open vtyw opened 7 months ago

vtyw commented 7 months ago

TL;DR: After TensorRT model conversion, confidence scores drop in accuracy much more than bounding box values. What property of TensorRT or the YOLO architecture causes this?

I use tkDNN for converting yolov4 and yolov4-csp models. The detections using the converted TensorRT engines look very similar visually to the original darknet detections, but on closer inspection, there are some noticeable differences.

Bounding box values before and after conversion are very very similar, with only a few outliers (< 0.1%) of values shifting in value by more than 0.02. In contrast, confidence scores before and after conversion can differ significantly, even as much as by 0.5! Of course, the converted network won't perform the same as before, but it seems surprising that bounding box values are highly consistent post-conversion, while confidence values can be occasionally inconsistent by a high margin.

Here's an example of before and after detections to illustrate the behavior I mean. Note that some detection inputs and networks perform much worse than this, e.g. for yolov4 pretrained weights I can get as much as one third of confidence scores shifting in value by more than 0.05.

tkdnn-fp32-accuracy

Possible explanations that I've ruled out: