dongfangduoshou123 / YoloV3-TensorRT

Run YoloV3 with the newest TensorRT6.0 at 37 fps on NVIIDIA 1060.
MIT License
86 stars 30 forks source link

编译过程中遇到的几个错误 #9

Closed runrunrun1994 closed 4 years ago

runrunrun1994 commented 4 years ago

1.几处大写的ASSERT报错,没有见到宏定义,于是改为了assert。不知道是不是少包含了头文件? https://github.com/dongfangduoshou123/YoloV3-TensorRT/blob/master/sampleYoloV3.cpp#L177 https://github.com/dongfangduoshou123/YoloV3-TensorRT/blob/master/sampleYoloV3.cpp#L182 https://github.com/dongfangduoshou123/YoloV3-TensorRT/blob/master/sampleYoloV3.cpp#L186

2.缺少几个头文件 DIR 需要包含 #include 头文件,报错位置如下: https://github.com/dongfangduoshou123/YoloV3-TensorRT/blob/master/sampleYoloV3.cpp#L322 struct stat 需要包含 #include <sys/types.h> #include <sys/stat.h> 报错位置如下: https://github.com/dongfangduoshou123/YoloV3-TensorRT/blob/master/sampleYoloV3.cpp#L327 access需要包含头文件 #include #include 报错位置如下: https://github.com/dongfangduoshou123/YoloV3-TensorRT/blob/master/sampleYoloV3.cpp#L469

dongfangduoshou123 commented 4 years ago

嗯,这个是从我的项目的拆出来放上去的,assert那个大写的是trt库里的,拆出来,没注意,缺少的头文件加上编译过去了吧?可以pull过了,我合进去。------------------ 原始邮件 ------------------ 发件人: "Learner0918"notifications@github.com 发送时间: 2019年12月31日(星期二) 下午3:51 收件人: "dongfangduoshou123/YoloV3-TensorRT"YoloV3-TensorRT@noreply.github.com; 抄送: "Subscribed"subscribed@noreply.github.com; 主题: [dongfangduoshou123/YoloV3-TensorRT] 编译过程中遇到的几个错误 (#9)

1.几处大写的ASSERT报错,没有见到宏定义,于是改为了assert。不知道是不是少包含了头文件? https://github.com/dongfangduoshou123/YoloV3-TensorRT/blob/master/sampleYoloV3.cpp#L177 https://github.com/dongfangduoshou123/YoloV3-TensorRT/blob/master/sampleYoloV3.cpp#L182 https://github.com/dongfangduoshou123/YoloV3-TensorRT/blob/master/sampleYoloV3.cpp#L186

2.缺少几个头文件 DIR 需要包含 #include 头文件,报错位置如下: https://github.com/dongfangduoshou123/YoloV3-TensorRT/blob/master/sampleYoloV3.cpp#L322 struct stat 需要包含 #include <sys/types.h> #include <sys/stat.h> 报错位置如下: https://github.com/dongfangduoshou123/YoloV3-TensorRT/blob/master/sampleYoloV3.cpp#L327 access需要包含头文件 #include #include 报错位置如下: https://github.com/dongfangduoshou123/YoloV3-TensorRT/blob/master/sampleYoloV3.cpp#L469

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

runrunrun1994 commented 4 years ago

非常感谢您及时回复,我目前还遇到两个问题没有完全跑通。 1.BN层shape不匹配:ERROR: builtin_op_importers.cpp:695 In function importBatchNormalization: [6] Assertion failed: scale_weights.shape == weights_shape 错误来源于这一句:auto network = SampleUniquePtr(builder->createNetworkV2(1U)); 我改为auto network = SampleUniquePtr(builder->createNetworkV2(0U)); 可以执行通过,目前我还没有明白为什么。这一句执行通过之后又出现了问题2。

  1. 获取不到YOLO层的输出 255 13 13 0 0 vs 430953 255 26 26 0 0 vs 1723803 255 52 52 0 Segmentation fault (core dumped)

针对上面两个问题,我想找您确认一下我生成yolov3.onnx的步骤是否正确。 a.使用的代码是TensorRT-ROOT/samples/python/yolov3_onnx; b.将yolov3.cfg的输入大小由608修改为416; c.修改output_tensor_dims的维度; d.生成yolov3.onnx。

最后祝您新年快乐,事事顺遂!

dongfangduoshou123 commented 4 years ago

用的onnx2trt的版本对吗? 参考一下这个https://github.com/dongfangduoshou123/YoloV3-TensorRT/issues/1 希望能帮到。 auto network = SampleUniquePtrnvinfer1::INetworkDefinition(builder->createNetworkV2(1U));应该用这个,显式指定batch维度。

把你由ASSERT改为assert,加了缺失include文件的代码pull一个过来,我合一下。

runrunrun1994 commented 4 years ago

谢谢您!onnx==1.5.0, TensorRT==6.0.1.5,只要我显示指定batch维度,parser->parseFromFile就解析失败,报ERROR: builtin_op_importers.cpp:695 In function importBatchNormalization: [6] Assertion failed: scale_weights.shape == weights_shape。

同时我也尝试了seralizeEngineFromPythonAPI.py代码,也是在那个地方编译不过。我还没找到问题的本质所在。

dongfangduoshou123 commented 4 years ago

yolo v3.onnx都parse不成功?还没到自主编辑网络这一步呢啊。。。。他那个python的例子也有parse的过程,看看在python下能不能解析成功。确定一下onnx是不是没问题。------------------ 原始邮件 ------------------ 发件人: "Learner0918"notifications@github.com 发送时间: 2020年1月2日(星期四) 晚上6:26 收件人: "dongfangduoshou123/YoloV3-TensorRT"YoloV3-TensorRT@noreply.github.com; 抄送: "dongfangduoshou123"471747996@qq.com;"State change"state_change@noreply.github.com; 主题: Re: [dongfangduoshou123/YoloV3-TensorRT] 编译过程中遇到的几个错误 (#9)

谢谢您!onnx==1.5.0, TensorRT==6.0.1.5,只要我显示指定batch维度,parser->parseFromFile就解析失败,报ERROR: builtin_op_importers.cpp:695 In function importBatchNormalization: [6] Assertion failed: scale_weights.shape == weights_shape。

同时我也尝试了seralizeEngineFromPythonAPI.py代码,也是在那个地方编译不过。我还没找到问题的本质所在。

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub, or unsubscribe.

runrunrun1994 commented 4 years ago

yolov3.onnx parse解析不成功的原因我找到了,原因在于TensorRT6.0.1.5里面的parser版本太低,出现下面这句警告:

WARNING: ONNX model has a newer ir_version (0.0.5) than this parser was built against (0.0.3). 我尝试了一下TensorRT-7.0.0.11,警告没有了,yolov3.onnx文件 parse过了,但我又遇到了新的问题。 新问题:doInference的返回结果为空,也就是三个输出层的hostBuffer都为空。 我真的是太难了!再请教一下您,TensorRT6.0.1.5您是直接在nvidia官网下的tar包吗?

dongfangduoshou123 commented 4 years ago

警告而已,不崩溃就行,我当时也有警告来着。 开源组件是从github上clone的,二进制组建就是根据他那个TensorRT的github上的readme一步一步下下来的。

dongfangduoshou123 commented 4 years ago

能有了吗?大哥。。。

runrunrun1994 commented 4 years ago

大哥,现在能跑起来,但没有检测结果,我还在DEBUG,那几个大写ASSERT很关键。

runrunrun1994 commented 4 years ago

对了,大哥生成trt之后加载会有deserializeCudaEngine::30, condition: (blob) != nullptr错误。

dongfangduoshou123 commented 4 years ago

https://github.com/NVIDIA/TensorRT/issues/178 plugin必须用plugin对应的creator来创建,自己new的话,反序列化就崩溃了,但是就像上面我提的哪个issue说的,我在sampleYoloV3.cpp里已经改为用它的creator来创建了,反系列化就没错了,你这是又出新的幺蛾子了…………?

runrunrun1994 commented 4 years ago

我现在还没关注这个问题,我现在主要的问题是虽然整个demo能跑通了,能显示结果图,但没有检测框。导致这个问题的原因主要是网络没有返回值,最后解码结果时,用的是hostBuffer里面默认初始值0值。又回到最初的起点了,我真是太难了!

runrunrun1994 commented 4 years ago

大哥,我终于跑通了,代码有好几个笔误的地方啊!

dongfangduoshou123 commented 4 years ago

恭喜,那我关了这个issue,你肯定学到不少东西。。。。

dongfangduoshou123 commented 4 years ago

@Learner0918 我又有新的更新传上来了(tensorrt网络的统一接口封装),建议你尝试一下,有bug修复的话,欢迎pull过来。

runrunrun1994 commented 4 years ago

谢谢老哥,我试一试。

wavesCHJ commented 4 years ago

@Learner0918 大哥,我也遇到了这个问题 WARNING: ONNX model has a newer ir_version (0.0.4) than this parser was built against (0.0.3). [01/05/2020-21:39:38] [E] [TRT] (Unnamed Layer* 6) [Convolution]: at least 4 dimensions are required for input While parsing node number 7 [BatchNormalization]: ERROR: builtin_op_importers.cpp:695 In function importBatchNormalization: [6] Assertion failed: scale_weights.shape == weights_shape 用的是sampleOnnxMNIST.cpp的代码,是在运行这行的时候报错的: auto parsed = parser->parseFromFile(locateFile(mParams.onnxFileName, mParams.dataDirs).c_str(), static_cast<int>(gLogger.getReportableSeverity())); 我的TensorRT版本也是6.0,请问您是怎么解决这个问题的啊?是改用TensorRT7.0吗?

runrunrun1994 commented 4 years ago

@wavesCHJ 用TensorRT7.0的确可以解决这个问题,CMakeList.txt好像要改一个地方。

wavesCHJ commented 4 years ago

@Learner0918 你好,因为我这边有些环境不支持7.0,我想问一下,如果要继续用TensorRT6的话,应该怎么解决这个问题啊?

runrunrun1994 commented 4 years ago

@wavesCHJ 参考#1,自己不通过createNetwork设置batch size 大小。

uCedar commented 4 years ago

@Learner0918 你好,“能显示结果图,但没有检测框。导致这个问题的原因主要是网络没有返回值”你遇到的问题我也遇到了,网络输出值都是0,请问你是怎么解决的呢

Scheaven commented 4 years ago

题,CMakeList.t

我用TensorRT 7 报错: 无法找到libnvonnxparser_runtime.so ,我将CMake文件中的改文件给注释掉了,但是后边又报错: 你有遇到Cuda failure:out of memory这个问题吗?


Input filename: ../model_dump/yolov3.onnx ONNX IR version: 0.0.3 Opset version: 7 Producer name: NVIDIA TensorRT sample Producer version: Domain:
Model version: 0 Doc string:

[03/15/2020-15:32:02] [I] [TRT] Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output. [03/15/2020-15:33:15] [I] [TRT] Detected 1 inputs and 3 output network tensors. jinle buffer:0x3ea265f4 buffer:0x3ea265f8 buffer:0x3ea265fc buffer:0x3ea26600 jinle buffer:0x3ea265f4 buffer:0x3ea265f8 buffer:0x3ea265fc buffer:0x3ea26600 jinle buffer:0x3ea265f4 buffer:0x3ea265f8 buffer:0x3ea265fc buffer:0x3ea26600

[03/15/2020-15:33:32] [W] [TRT] Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles [03/15/2020-15:33:32] [W] [TRT] Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles [03/15/2020-15:33:32] [E] [TRT] ../rtSafe/safeRuntime.cpp (25) - Cuda Error in allocate: 2 (out of memory) [03/15/2020-15:33:32] [E] [TRT] FAILED_ALLOCATION: std::exception Segmentation fault (core dumped)

runrunrun1994 commented 4 years ago

尝试将batchsize设小一点

wangwenbin1991 commented 4 years ago

@Scheaven 遇到了同样的问题,请问你解决了吗?

wangwenbin1991 commented 4 years ago

造成“cuda Error in allocate:2(out of memory)"的一个可能的原因是从yolov3.weights生成yolov3.onnx时,所用的yolov3.cfg文件中的“batch”值还是训练时的值(如64),可以通过减小该值解决。