Tencent / TNN

TNN: developed by Tencent Youtu Lab and Guangying Lab, a uniform deep learning inference framework for mobile、desktop and server. TNN is distinguished by several outstanding features, including its cross-platform capability, high performance, model compression and code pruning. Based on ncnn and Rapidnet, TNN further strengthens the support and performance optimization for mobile devices, and also draws on the advantages of good extensibility and high performance from existed open source efforts. TNN has been deployed in multiple Apps from Tencent, such as Mobile QQ, Weishi, Pitu, etc. Contributions are welcome to work in collaborative with us and make TNN a better framework.
Other
4.41k stars 771 forks source link

windows端使用cuda推理,模型输出为0 #1314

Closed XZLancer closed 3 years ago

XZLancer commented 3 years ago

1. 环境(environment)

2021-09-15 10:45:25.177 ( 2.734s) [ 3D17E185] inference_base.cc:128 INFO| input size: 43200 2021-09-15 10:45:25.178 ( 2.734s) [ 3D17E185] inference_base.cc:131 INFO| front: 2021-09-15 10:45:25.178 ( 2.734s) [ 3D17E185] inference_base.cc:133 INFO| -0.183594 2021-09-15 10:45:25.178 ( 2.735s) [ 3D17E185] inference_base.cc:133 INFO| -0.160156 2021-09-15 10:45:25.179 ( 2.735s) [ 3D17E185] inference_base.cc:133 INFO| -0.089844 2021-09-15 10:45:25.179 ( 2.736s) [ 3D17E185] inference_base.cc:133 INFO| -0.191406 2021-09-15 10:45:25.180 ( 2.736s) [ 3D17E185] inference_base.cc:133 INFO| -0.136719 2021-09-15 10:45:25.180 ( 2.736s) [ 3D17E185] inference_base.cc:133 INFO| -0.175781 2021-09-15 10:45:25.180 ( 2.737s) [ 3D17E185] inference_base.cc:133 INFO| -0.191406 2021-09-15 10:45:25.181 ( 2.737s) [ 3D17E185] inference_base.cc:133 INFO| -0.191406 2021-09-15 10:45:25.181 ( 2.737s) [ 3D17E185] inference_base.cc:133 INFO| -0.183594 2021-09-15 10:45:25.181 ( 2.738s) [ 3D17E185] inference_base.cc:133 INFO| -0.175781 2021-09-15 10:45:25.181 ( 2.738s) [ 3D17E185] inference_base.cc:133 INFO| -0.167969 2021-09-15 10:45:25.182 ( 2.738s) [ 3D17E185] inference_base.cc:133 INFO| -0.175781 2021-09-15 10:45:25.182 ( 2.738s) [ 3D17E185] inference_base.cc:133 INFO| -0.183594 2021-09-15 10:45:25.182 ( 2.738s) [ 3D17E185] inference_base.cc:133 INFO| -0.144531 2021-09-15 10:45:25.182 ( 2.738s) [ 3D17E185] inference_base.cc:133 INFO| -0.214844 2021-09-15 10:45:25.182 ( 2.739s) [ 3D17E185] inference_base.cc:133 INFO| -0.191406 2021-09-15 10:45:25.182 ( 2.739s) [ 3D17E185] inference_base.cc:133 INFO| -0.183594 2021-09-15 10:45:25.183 ( 2.739s) [ 3D17E185] inference_base.cc:133 INFO| -0.167969 2021-09-15 10:45:25.183 ( 2.739s) [ 3D17E185] inference_base.cc:133 INFO| -0.175781 2021-09-15 10:45:25.183 ( 2.739s) [ 3D17E185] inference_base.cc:133 INFO| -0.183594

2021-09-15 10:45:25.472 ( 3.028s) [ 3D17E185]inference_engine_tnn.cc:95 INFO| status: 1 2021-09-15 10:45:25.472 ( 3.029s) [ 3D17E185]inference_engine_tnn.cc:100 INFO| output is not null 2021-09-15 10:45:25.475 ( 3.031s) [ 3D17E185]inference_engine_tnn.cc:107 INFO| inference output total num: 62 2021-09-15 10:45:25.478 ( 3.034s) [ 3D17E185] inference_base.cc:144 INFO| output name: 255, data type: 0

2021-09-15 10:45:25.478 ( 3.034s) [ 3D17E185] inference_base.cc:147 INFO| output size: 62 2021-09-15 10:45:25.478 ( 3.034s) [ 3D17E185] inference_base.cc:150 INFO| front: 2021-09-15 10:45:25.478 ( 3.034s) [ 3D17E185] inference_base.cc:152 INFO| 0.000000 2021-09-15 10:45:25.478 ( 3.034s) [ 3D17E185] inference_base.cc:152 INFO| 0.000000 2021-09-15 10:45:25.478 ( 3.034s) [ 3D17E185] inference_base.cc:152 INFO| 0.000000 2021-09-15 10:45:25.478 ( 3.035s) [ 3D17E185] inference_base.cc:152 INFO| 0.000000 2021-09-15 10:45:25.478 ( 3.035s) [ 3D17E185] inference_base.cc:152 INFO| 0.000000 2021-09-15 10:45:25.478 ( 3.035s) [ 3D17E185] inference_base.cc:152 INFO| 0.000000 2021-09-15 10:45:25.478 ( 3.035s) [ 3D17E185] inference_base.cc:152 INFO| 0.000000 2021-09-15 10:45:25.478 ( 3.035s) [ 3D17E185] inference_base.cc:152 INFO| 0.000000 2021-09-15 10:45:25.479 ( 3.035s) [ 3D17E185] inference_base.cc:152 INFO| 0.000000 2021-09-15 10:45:25.479 ( 3.035s) [ 3D17E185] inference_base.cc:152 INFO| 0.000000 2021-09-15 10:45:25.479 ( 3.035s) [ 3D17E185] inference_base.cc:152 INFO| 0.000000 2021-09-15 10:45:25.479 ( 3.035s) [ 3D17E185] inference_base.cc:152 INFO| 0.000000 2021-09-15 10:45:25.479 ( 3.035s) [ 3D17E185] inference_base.cc:152 INFO| 0.000000 2021-09-15 10:45:25.479 ( 3.035s) [ 3D17E185] inference_base.cc:152 INFO| 0.000000 2021-09-15 10:45:25.479 ( 3.035s) [ 3D17E185] inference_base.cc:152 INFO| 0.000000 2021-09-15 10:45:25.479 ( 3.035s) [ 3D17E185] inference_base.cc:152 INFO| 0.000000 2021-09-15 10:45:25.479 ( 3.036s) [ 3D17E185] inference_base.cc:152 INFO| 0.000000 2021-09-15 10:45:25.479 ( 3.036s) [ 3D17E185] inference_base.cc:152 INFO| 0.000000 2021-09-15 10:45:25.479 ( 3.036s) [ 3D17E185] inference_base.cc:152 INFO| 0.000000 2021-09-15 10:45:25.479 ( 3.036s) [ 3D17E185] inference_base.cc:152 INFO| 0.000000

- 设置为DEVICE_NAIVE运行结果:
```txt
2021-09-15 10:59:27.862 (   1.321s) [        ABFF05B8]   face_module_utils.cc:78    INFO| Success create inference map
2021-09-15 10:59:27.862 (   1.321s) [        ABFF05B8]   face_landmark_imp.cc:31    INFO| Landmark success to load engines
2021-09-15 10:59:27.863 (   1.322s) [        ABFF05B8]   face_landmark_imp.cc:36    INFO| Landmark start to detect face kps
2021-09-15 10:59:27.863 (   1.322s) [        ABFF05B8]    face_landmark_v2.cc:20    INFO| FaceLandmark preprocess
2021-09-15 10:59:27.864 (   1.323s) [        ABFF05B8]    face_landmark_v2.cc:62    INFO| roi box is :87.00 157.97 261.00 331.97
2021-09-15 10:59:27.865 (   1.324s) [        ABFF05B8]    face_landmark_v2.cc:117   INFO| FaceLandmark preprocess
2021-09-15 10:59:27.865 (   1.324s) [        ABFF05B8]      inference_base.cc:125   INFO| input name: input.1, data type: 0

2021-09-15 10:59:27.866 (   1.325s) [        ABFF05B8]      inference_base.cc:128   INFO| input size: 43200
2021-09-15 10:59:27.867 (   1.326s) [        ABFF05B8]      inference_base.cc:131   INFO| front:
2021-09-15 10:59:27.867 (   1.326s) [        ABFF05B8]      inference_base.cc:133   INFO| -0.183594
2021-09-15 10:59:27.867 (   1.327s) [        ABFF05B8]      inference_base.cc:133   INFO| -0.160156
2021-09-15 10:59:27.868 (   1.327s) [        ABFF05B8]      inference_base.cc:133   INFO| -0.089844
2021-09-15 10:59:27.868 (   1.327s) [        ABFF05B8]      inference_base.cc:133   INFO| -0.191406
2021-09-15 10:59:27.868 (   1.327s) [        ABFF05B8]      inference_base.cc:133   INFO| -0.136719
2021-09-15 10:59:27.868 (   1.327s) [        ABFF05B8]      inference_base.cc:133   INFO| -0.175781
2021-09-15 10:59:27.869 (   1.328s) [        ABFF05B8]      inference_base.cc:133   INFO| -0.191406
2021-09-15 10:59:27.869 (   1.328s) [        ABFF05B8]      inference_base.cc:133   INFO| -0.191406
2021-09-15 10:59:27.869 (   1.328s) [        ABFF05B8]      inference_base.cc:133   INFO| -0.183594
2021-09-15 10:59:27.869 (   1.328s) [        ABFF05B8]      inference_base.cc:133   INFO| -0.175781
2021-09-15 10:59:27.870 (   1.329s) [        ABFF05B8]      inference_base.cc:133   INFO| -0.167969
2021-09-15 10:59:27.870 (   1.329s) [        ABFF05B8]      inference_base.cc:133   INFO| -0.175781
2021-09-15 10:59:27.870 (   1.329s) [        ABFF05B8]      inference_base.cc:133   INFO| -0.183594
2021-09-15 10:59:27.870 (   1.329s) [        ABFF05B8]      inference_base.cc:133   INFO| -0.144531
2021-09-15 10:59:27.871 (   1.330s) [        ABFF05B8]      inference_base.cc:133   INFO| -0.214844
2021-09-15 10:59:27.871 (   1.330s) [        ABFF05B8]      inference_base.cc:133   INFO| -0.191406
2021-09-15 10:59:27.871 (   1.330s) [        ABFF05B8]      inference_base.cc:133   INFO| -0.183594
2021-09-15 10:59:27.880 (   1.339s) [        ABFF05B8]      inference_base.cc:133   INFO| -0.167969
2021-09-15 10:59:27.880 (   1.339s) [        ABFF05B8]      inference_base.cc:133   INFO| -0.175781
2021-09-15 10:59:27.881 (   1.340s) [        ABFF05B8]      inference_base.cc:133   INFO| -0.183594

2021-09-15 10:59:27.974 (   1.433s) [        ABFF05B8]inference_engine_tnn.cc:95    INFO| status: 1
2021-09-15 10:59:27.975 (   1.434s) [        ABFF05B8]inference_engine_tnn.cc:100   INFO| output is not null
2021-09-15 10:59:27.979 (   1.438s) [        ABFF05B8]inference_engine_tnn.cc:107   INFO| inference output total num: 62
2021-09-15 10:59:27.982 (   1.441s) [        ABFF05B8]      inference_base.cc:144   INFO| output name: 255, data type: 0

2021-09-15 10:59:27.982 (   1.441s) [        ABFF05B8]      inference_base.cc:147   INFO| output size: 62
2021-09-15 10:59:27.982 (   1.441s) [        ABFF05B8]      inference_base.cc:150   INFO| front:
2021-09-15 10:59:27.982 (   1.441s) [        ABFF05B8]      inference_base.cc:152   INFO| 0.005090
2021-09-15 10:59:27.982 (   1.441s) [        ABFF05B8]      inference_base.cc:152   INFO| -1.703724
2021-09-15 10:59:27.982 (   1.441s) [        ABFF05B8]      inference_base.cc:152   INFO| -0.654975
2021-09-15 10:59:27.982 (   1.442s) [        ABFF05B8]      inference_base.cc:152   INFO| 0.754547
2021-09-15 10:59:27.983 (   1.442s) [        ABFF05B8]      inference_base.cc:152   INFO| 1.381107
2021-09-15 10:59:27.983 (   1.442s) [        ABFF05B8]      inference_base.cc:152   INFO| -3.742593
2021-09-15 10:59:27.983 (   1.442s) [        ABFF05B8]      inference_base.cc:152   INFO| 1.460973
2021-09-15 10:59:27.983 (   1.442s) [        ABFF05B8]      inference_base.cc:152   INFO| -1.736007
2021-09-15 10:59:27.983 (   1.442s) [        ABFF05B8]      inference_base.cc:152   INFO| 0.508987
2021-09-15 10:59:27.983 (   1.442s) [        ABFF05B8]      inference_base.cc:152   INFO| -1.790652
2021-09-15 10:59:27.983 (   1.442s) [        ABFF05B8]      inference_base.cc:152   INFO| 0.190176
2021-09-15 10:59:27.983 (   1.442s) [        ABFF05B8]      inference_base.cc:152   INFO| 0.000012
2021-09-15 10:59:27.984 (   1.443s) [        ABFF05B8]      inference_base.cc:152   INFO| 0.936669
2021-09-15 10:59:27.984 (   1.443s) [        ABFF05B8]      inference_base.cc:152   INFO| 0.001836
2021-09-15 10:59:27.984 (   1.443s) [        ABFF05B8]      inference_base.cc:152   INFO| -0.310146
2021-09-15 10:59:27.984 (   1.443s) [        ABFF05B8]      inference_base.cc:152   INFO| 0.875878
2021-09-15 10:59:27.984 (   1.443s) [        ABFF05B8]      inference_base.cc:152   INFO| 0.412285
2021-09-15 10:59:27.984 (   1.443s) [        ABFF05B8]      inference_base.cc:152   INFO| 0.160706
2021-09-15 10:59:27.984 (   1.443s) [        ABFF05B8]      inference_base.cc:152   INFO| -0.041408
2021-09-15 10:59:27.984 (   1.443s) [        ABFF05B8]      inference_base.cc:152   INFO| 0.286085

感谢答复

Maosquerade commented 3 years ago

请问推理的时候是用 TNNTest 运行还是自己写的 sdk? 这种运行正确无结果输出的情况可能是最后没有从 gpu 将数据取出到 cpu,可以检查一下 output_map 有没有数据转换

XZLancer commented 3 years ago

请问推理的时候是用 TNNTest 运行还是自己写的 sdk? 这种运行正确无结果输出的情况可能是最后没有从 gpu 将数据取出到 cpu,可以检查一下 output_map 有没有数据转换

@Maosquerade 使用自己写的sdk,最后得到结果使用了GetOutputMat并指定DeviceType为DEVICE_NAIVE,但是得到的mat->GetData()中数据全为0。output_map的数据转换指的是这个吗?

Maosquerade commented 3 years ago

对的,应该从 getOutputMat 里拿到的数据是 cpu 的,结果为0比较奇怪。 请问方便贴一下 sdk 的代码吗?

Maosquerade commented 3 years ago

那有可能是模型运行的问题,不知道是否方便发一份模型我们排查一下?

XZLancer commented 3 years ago

好的,onnx和tnn模型见下。 https://pan.baidu.com/s/1m7U7Cxz9F63Fhoy_rfu09g 提取码: 3e5s

Maosquerade commented 3 years ago

你好,经测试该模型 CUDA 也是可以正常且结果对齐的,可以贴一下完整的 sdk 代码吗?包括输入输出的处理? 另外可以指定一下你编译 TNN 的 commit id,这边也方便在具体 commit 上测试一下

Maosquerade commented 3 years ago

你好,可以尝试一下用 Forward 代替 ForwardAsync; 贴出来的代码看起来没有问题,可以尝试用 TNNTest 测试一下 CUDA 版本有无输出,如果有可以再检查一下 sdk 代码,如果没有可以检测一下 GPU 是否有问题。

XZLancer commented 3 years ago

TNNTest的输出如下。

F:\tnn\TNN\scripts\cuda_msvc_release\bin> .\TNNTest.exe -mp .\face_landmark.tnnproto -dt CUDA -wc 10
I/tnn: tnn::test::Timer::Print [File F:\tnn\TNN\test\timer.cc][Line 60] .\face_landmark.tnnproto - CUDA               
TNN Benchmark time cost: min =  0.804   ms  |  max =  0.804   ms  |  avg =  0.804   ms

尝试过使用Forward,输出仍然是0 想问下您使用的CUDA、cudnn以及TensorRT版本是什么?

Maosquerade commented 3 years ago

TNNTest 用 -op 可指定查看输出结果,看看是不是也是 0 我们测试的环境和你一样,是 CUDA 11.0 cudnn 8.0.5 及 TensorRT 7.1.3.4

XZLancer commented 3 years ago

确实全为0。这个维度是否算错了?应该只有62个输出值

1
255 mat_type: 32 dims: 2 1 62
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
Maosquerade commented 3 years ago

这里维度是正确的,dims 第一个数是指 dimSize 感觉你可以检查一下显卡驱动,或者运行时检查一下 GPU 占用率和显存使用率。

XZLancer commented 3 years ago

运行时GPU显存占用是有在上升的,nvidia-smi下的process也有相应的进程。显卡驱动是最新的,重装了CUDA之后问题仍然存在