Mismatch in number of Detections with TFRT onnx inference VS pytorch, pth file

Allamrahul commented 1 year ago

Dataset: I am using a custom dataset with npy files and annotations. I followed all steps required for custom dataset preparation and I am able to get great results with pytorch with 90% map on my eval set.

However, once I convert the pth file to onnx format using exporter.py, for every point cloud in my eval dataset, I am seeing relatively smaller number of detections using TFRT inference with the cpp script as opposed to what I am getting using pytorch with the pth file.

In regard to the export process, exporter.py and simplifier_onnx.py are being used in the script. However, both scripts are hardcoded for 3 classes for kitti dataset. I have just one class to detect. Hence, I referred to the following commit to make the onnx export work: https://github.com/NVIDIA-AI-IOT/CUDA-PointPillars/pull/77/commits. After this , I was able to export but I faced the following issue after this: https://github.com/NVIDIA-AI-IOT/CUDA-PointPillars/issues/82. I resolved this by tinkering with the export script, as mentioned on the following comment: https://github.com/NVIDIA-AI-IOT/CUDA-PointPillars/pull/77#issuecomment-1424700901. After this, my detections using TFRT onnx were atleast a subset of what I was seeing with pytorch but not the whole set. There is a clear delta between TFRT onnx and pytorch pth combo, in majority of my eval set. This can be seen in the following table:

Bounding box delta comparision: pytorch .pth VS TensorFlow RT onnx

File | Pytorch pth | TFRT cpp using .onnx file | Delta -- | -- | -- | -- 000000.npy | tensor([[ 9.6498, 1.1609, 1.9397, 0.2856, 0.4898, 2.8947, 6.2814], [ 24.8358, 1.3459, 2.5912, 0.2332, 0.4984, 3.0438, 6.2827], [ 24.9936, -10.4810, 3.2429, 0.2568, 0.4702, 3.1647, 6.2816], [ 9.8542, -10.6894, 2.1888, 0.4316, 0.4553, 2.7412, 6.2486]], device='cuda:0') | 24.8358 1.34592 2.59117 0.23324 0.498444 3.04383 6.28266 0 0.46325 ; 24.9936 -10.481 3.24294 0.256755 0.47017 3.16474 6.28156 0 0.445165 ; 9.8573 -10.6925 2.17166 0.433223 0.452724 2.7258 6.24912 0 0.445157 | 1 000001.npy | tensor([[ 9.6501, 1.1778, 1.8507, 0.2533, 0.4935, 2.7208, 6.2741], [ 24.9947, -10.4883, 3.0557, 0.2706, 0.4838, 3.0915, 6.2594], [ 24.8404, 1.3479, 2.6033, 0.2287, 0.4947, 3.0391, 6.2825], [ 9.8570, -10.6883, 2.1663, 0.4322, 0.4521, 2.7124, 6.2346]], device='cuda:0') | 9.65337 1.1817 1.80798 0.248034 0.493837 2.66008 6.27361 0 ; 24.9947 -10.4883 3.05572 0.270619 0.483843 3.09145 6.25942 0 0.670895 ; 24.8404 1.34787 2.60326 0.228719 0.494724 3.03909 6.28252 0 0.459299 ; 9.8545 -10.6925 2.1472 0.438129 0.448904 2.7132 6.23376 0 0.424986 | 0 000002.npy | tensor([[ 9.6042, 1.1503, 2.0593, 0.2839, 0.4955, 2.9902, 6.3128], [ 24.7882, 1.3638, 2.6522, 0.2538, 0.5039, 3.1623, 6.2903], [ 9.7436, -10.6760, 2.1350, 0.3712, 0.4578, 2.6609, 6.2507], [ 24.9494, -10.5134, 3.2150, 0.2888, 0.4944, 3.3462, 6.2143]], device='cuda:0') | 9.74478 -10.6817 2.1041 0.374984 0.453993 2.63108 6.25019 0 0.532783 ; 24.9494 -10.5134 3.21504 0.288844 0.494413 3.34624 6.21432 0 0.515557 ; 0.309276 -10.6853 2.08503 0.458935 0.413923 3.13058 6.09365 0 0.412784 | 1 000003.npy | tensor([[ 9.5610, -10.4589, 2.1206, 0.4139, 0.4505, 2.7193, 6.2802], [ 24.3758, 1.7272, 2.6000, 0.2396, 0.4966, 3.0571, 6.1985], [ 24.7097, -10.1406, 3.0566, 0.2619, 0.4718, 3.0835, 6.2728], [ 9.2311, 1.3354, 1.8251, 0.2543, 0.4891, 2.7015, 6.2441], [ 8.9262, 7.8720, 2.1033, 0.3872, 0.4424, 2.7067, 6.3819]], device='cuda:0') | 9.56115 -10.4598 2.09798 0.418282 0.448642 2.68469 6.27597 0 0.735731 ; 24.3758 1.72724 2.59998 0.239596 0.496643 3.05714 6.19854 0 0.629267 ; 24.7097 -10.1406 3.0566 0.26186 0.471776 3.08349 6.27275 0 0.585723 ; 9.21606 1.33047 1.82858 0.254299 0.490583 2.66956 6.23728 0 0.471899 | 1 000004.npy | tensor([[ 6.4732, 2.6481, 1.7006, 0.2879, 0.4678, 2.6444, 6.3118], [21.4290, 4.8774, 2.5937, 0.2325, 0.5022, 3.1258, 6.4040], [23.1383, -6.8599, 2.7714, 0.2839, 0.4960, 3.0160, 6.3080], [ 8.1175, -8.9831, 2.2486, 0.3856, 0.4450, 2.7676, 6.3550]], device='cuda:0') | 23.1383 -6.85986 2.77142 0.283893 0.495966 3.01596 6.30801 0 0.580739 ; 8.11463 -8.9818 2.12152 0.396575 0.436063 2.65015 6.35895 0 0.429396 | 2 000005.npy | tensor([[ 5.5251, 2.7731, 1.6679, 0.3284, 0.4662, 2.6940, 6.2788], [20.4834, 5.0487, 2.5489, 0.2769, 0.5241, 3.1817, 6.4027], [ 7.3220, -8.8810, 2.1011, 0.4506, 0.4281, 2.6641, 6.3688], [22.2850, -6.6383, 2.6867, 0.2744, 0.4986, 3.0367, 6.3119]], device='cuda:0') | 7.32207 -8.88152 2.0861 0.445914 0.430497 2.6552 6.36896 0 0.696223 | 3 000006.npy | tensor([[18.0280, 4.9469, 2.4509, 0.3035, 0.5205, 3.1520, 6.3221], [19.8413, -6.7181, 2.7475, 0.3097, 0.5246, 3.2910, 6.3001], [ 3.1871, 2.6373, 1.7287, 0.4621, 0.4224, 2.9021, 6.3156], [ 4.8621, -8.9172, 1.8402, 0.4540, 0.3952, 2.5332, 6.3420], [32.0742, 7.1384, 3.3039, 0.2361, 0.4806, 3.3647, 6.4108], [21.2824, 12.1162, 3.6256, 0.2676, 0.4659, 3.5638, 6.5643], [ 0.6082, 4.4304, 1.8762, 0.4470, 0.4348, 3.4172, 6.2065]], device='cuda:0') | 4.85492 -8.92965 1.819 0.460386 0.396642 2.5153 6.34298 0 0.494817 | 6 000007.npy | tensor([[18.2038, -6.8837, 2.5308, 0.3099, 0.5277, 3.1208, 6.3168], [16.5025, 4.7925, 2.3577, 0.3065, 0.5248, 3.0787, 6.3005], [ 1.5735, 2.6487, 1.6249, 0.5034, 0.4109, 2.6605, 6.3160], [ 2.2250, 2.7058, 1.8312, 0.4703, 0.4060, 3.0384, 6.3380], [ 3.2350, -8.9478, 1.8462, 0.4438, 0.4085, 2.5771, 6.3109], [19.7396, 11.9755, 3.2925, 0.2890, 0.5000, 3.6453, 6.5671], [ 3.5311, 2.8095, 2.3147, 0.4571, 0.4455, 4.2559, 6.3274], [30.5054, 6.8140, 3.3753, 0.2804, 0.5016, 3.6093, 6.2777]], device='cuda:0') | 18.2057 -6.88499 2.4907 0.307031 0.527094 3.07328 6.31815 0 0.636754 ; 16.502 4.79033 2.33373 0.299566 0.523598 3.0561 6.3044 0 0.532995 ; 1.56738 2.64373 1.68283 0.506594 0.412098 2.66617 6.31967 0 0.51762 ; 3.22002 -8.95614 1.8366 0.449459 0.409571 2.56386 6.3068 0 0.431358 ; 2.2279 2.70934 1.85016 0.464891 0.40516 3.07841 6.33425 0 0.391239 ; 19.7397 11.9755 3.29258 0.28902 0.499917 3.64496 6.56848 0 0.381675 | 2 000008.npy | tensor([[ 8.7021, -7.9169, 2.6375, 0.3647, 0.4888, 3.5404, 6.2655], [ 7.7196, 3.7774, 2.3025, 0.4060, 0.4704, 3.2993, 6.2707], [22.8483, -6.6640, 3.5341, 0.3350, 0.5277, 4.1040, 6.3141], [21.7832, 5.1120, 2.8534, 0.2781, 0.5178, 3.2145, 6.1912], [ 3.2359, -8.4495, 2.0291, 0.4187, 0.4105, 3.2451, 6.2915]], device='cuda:0') | 8.70127 -7.92042 2.62612 0.36539 0.486129 3.51703 6.26476 0 0.864963 ; 7.6994 3.79393 2.24546 0.40736 0.469539 3.21603 6.25044 0 0.73586 ; 22.8483 -6.66398 3.53411 0.335008 0.527745 4.10398 6.31413 0 0.605781 ; 21.7832 5.11193 2.85462 0.278421 0.517415 3.21271 6.21329 0 0.508611 ; | 1 000009.npy | tensor([[19.5711, 4.7877, 2.6956, 0.3077, 0.5412, 3.3734, 6.2451], [ 6.3672, -8.0972, 2.7778, 0.4181, 0.4778, 4.1039, 6.2421], [ 5.4901, 3.6080, 2.3323, 0.4340, 0.4502, 3.7175, 6.2740], [20.3728, -7.0433, 3.3803, 0.3514, 0.5351, 4.1972, 6.3070], [26.6330, 11.8861, 3.9950, 0.3089, 0.5019, 4.1503, 6.6127]], device='cuda:0') | 5.47306 3.61103 2.394 0.432978 0.453338 3.80027 6.32163 0 0.714706 ; 19.5717 4.78751 2.71062 0.308163 0.539413 3.36241 6.27686 0 0.621834 ; 6.35329 -8.10289 2.76789 0.422266 0.47866 4.13415 6.24032 0 0.606208 | 2 000010.npy | tensor([[18.3196, 4.6323, 3.2815, 0.3700, 0.5370, 4.5950, 6.3164], [ 5.0913, -8.1561, 2.6470, 0.4329, 0.4667, 4.0704, 6.2747], [19.1831, -7.1906, 3.3499, 0.3578, 0.5279, 4.2080, 6.3127], [ 2.5482, 4.3696, 1.6065, 0.4281, 0.3918, 2.8003, 6.2634]], device='cuda:0') | 5.08485 -8.16716 2.64149 0.431825 0.466464 4.03816 6.27571 0 0.731938 ; 19.1846 -7.19002 3.2872 0.352221 0.529464 4.08496 6.31286 0 0.591408 | 2 000011.npy | tensor([[15.3577, -7.3005, 3.0413, 0.3812, 0.5104, 4.2909, 6.3159], [ 0.6093, 3.4074, 1.9033, 0.5056, 0.4306, 3.3583, 6.1790], [14.5397, 4.4909, 3.0513, 0.3723, 0.5222, 4.3821, 6.2383], [30.4700, -6.2796, 4.0225, 0.2914, 0.4843, 3.8403, 6.3179], [29.6795, 5.5980, 4.0535, 0.2816, 0.4877, 3.9741, 6.2869]], device='cuda:0') | 0.594493 3.41456 2.11992 0.502219 0.441799 3.74912 6.17387 0 0.828488 ; 15.3587 -7.29961 2.99875 0.375657 0.512654 4.18005 6.31556 0 0.798267 ; 30.47 -6.27963 4.02255 0.29143 0.484331 3.84032 6.31788 0 0.434042 | 2 000012.npy | tensor([[ 11.2944, 4.3980, 3.0133, 0.3911, 0.5198, 4.6365, 6.2670], [ 26.4576, 5.3648, 3.6263, 0.3002, 0.5062, 3.8833, 6.3176], [ 12.0963, -7.3715, 3.0630, 0.3846, 0.5122, 4.3017, 6.2922], [ 8.1463, -12.5014, 2.9129, 0.3691, 0.4980, 3.9686, 6.1562], [ 27.1433, -6.4810, 3.9175, 0.3048, 0.5110, 3.9699, 6.3372], [ 18.4373, 11.4960, 3.7129, 0.3159, 0.4918, 4.2750, 6.4670]], device='cuda:0') | 8.14566 -12.506 2.84502 0.364298 0.498799 3.84938 6.15557 0 0.378752 ; 12.0904 -7.37816 2.90519 0.378811 0.516209 4.00902 6.29017 0 0.376648 | 4

Please let me know if you know something that could help me.

Allamrahul commented 1 year ago

I see the same behavior with the kitti dataset as well, as follows: Can anyone confirm if this an expected behavior or is this not supposed to happen?

KwangjinChoi commented 1 year ago

Hello, can you tell me how much the 3D detection performance drops?

Allamrahul commented 1 year ago

Hi, from my initial comment, there is delta as large as 6 in 000006.npy between pytorch pth and TFRT inference. I have about 30 evaluation point clouds and I see this drop in 90 % of them. Is there anything I can do to avoid this?

wangxj2014 commented 1 year ago

I also encountered the same problem. Is there any way to solve this problem?

Dreamdreams8 commented 4 months ago

The same problem. Has anyone solved it?

NVIDIA-AI-IOT / CUDA-PointPillars

Mismatch in number of Detections with TFRT onnx inference VS pytorch, pth file #83