编译过程中遇到的几个错误

runrunrun1994 commented 4 years ago

1.几处大写的ASSERT报错，没有见到宏定义，于是改为了assert。不知道是不是少包含了头文件？ https://github.com/dongfangduoshou123/YoloV3-TensorRT/blob/master/sampleYoloV3.cpp#L177 https://github.com/dongfangduoshou123/YoloV3-TensorRT/blob/master/sampleYoloV3.cpp#L182 https://github.com/dongfangduoshou123/YoloV3-TensorRT/blob/master/sampleYoloV3.cpp#L186

2.缺少几个头文件 DIR 需要包含 #include 头文件，报错位置如下： https://github.com/dongfangduoshou123/YoloV3-TensorRT/blob/master/sampleYoloV3.cpp#L322 struct stat 需要包含 #include <sys/types.h> #include <sys/stat.h> 报错位置如下： https://github.com/dongfangduoshou123/YoloV3-TensorRT/blob/master/sampleYoloV3.cpp#L327 access需要包含头文件 #include #include 报错位置如下: https://github.com/dongfangduoshou123/YoloV3-TensorRT/blob/master/sampleYoloV3.cpp#L469

dongfangduoshou123 commented 4 years ago

嗯，这个是从我的项目的拆出来放上去的，assert那个大写的是trt库里的，拆出来，没注意，缺少的头文件加上编译过去了吧？可以pull过了，我合进去。------------------ 原始邮件 ------------------ 发件人: "Learner0918"notifications@github.com 发送时间: 2019年12月31日(星期二) 下午3:51 收件人: "dongfangduoshou123/YoloV3-TensorRT"YoloV3-TensorRT@noreply.github.com; 抄送: "Subscribed"subscribed@noreply.github.com; 主题: [dongfangduoshou123/YoloV3-TensorRT] 编译过程中遇到的几个错误 (#9)

1.几处大写的ASSERT报错，没有见到宏定义，于是改为了assert。不知道是不是少包含了头文件？ https://github.com/dongfangduoshou123/YoloV3-TensorRT/blob/master/sampleYoloV3.cpp#L177 https://github.com/dongfangduoshou123/YoloV3-TensorRT/blob/master/sampleYoloV3.cpp#L182 https://github.com/dongfangduoshou123/YoloV3-TensorRT/blob/master/sampleYoloV3.cpp#L186

2.缺少几个头文件 DIR 需要包含 #include 头文件，报错位置如下： https://github.com/dongfangduoshou123/YoloV3-TensorRT/blob/master/sampleYoloV3.cpp#L322 struct stat 需要包含 #include <sys/types.h> #include <sys/stat.h> 报错位置如下： https://github.com/dongfangduoshou123/YoloV3-TensorRT/blob/master/sampleYoloV3.cpp#L327 access需要包含头文件 #include #include 报错位置如下: https://github.com/dongfangduoshou123/YoloV3-TensorRT/blob/master/sampleYoloV3.cpp#L469

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

runrunrun1994 commented 4 years ago

非常感谢您及时回复，我目前还遇到两个问题没有完全跑通。 1.BN层shape不匹配：ERROR: builtin_op_importers.cpp:695 In function importBatchNormalization: [6] Assertion failed: scale_weights.shape == weights_shape 错误来源于这一句：auto network = SampleUniquePtr(builder->createNetworkV2(1U)); 我改为auto network = SampleUniquePtr(builder->createNetworkV2(0U)); 可以执行通过，目前我还没有明白为什么。这一句执行通过之后又出现了问题2。

获取不到YOLO层的输出 255 13 13 0 0 vs 430953 255 26 26 0 0 vs 1723803 255 52 52 0 Segmentation fault (core dumped)

针对上面两个问题，我想找您确认一下我生成yolov3.onnx的步骤是否正确。 a.使用的代码是TensorRT-ROOT/samples/python/yolov3_onnx； b.将yolov3.cfg的输入大小由608修改为416； c.修改output_tensor_dims的维度； d.生成yolov3.onnx。

最后祝您新年快乐，事事顺遂！

dongfangduoshou123 commented 4 years ago

用的onnx2trt的版本对吗？参考一下这个https://github.com/dongfangduoshou123/YoloV3-TensorRT/issues/1 希望能帮到。 auto network = SampleUniquePtrnvinfer1::INetworkDefinition(builder->createNetworkV2(1U));应该用这个，显式指定batch维度。

把你由ASSERT改为assert，加了缺失include文件的代码pull一个过来，我合一下。

runrunrun1994 commented 4 years ago

谢谢您！onnx==1.5.0, TensorRT==6.0.1.5,只要我显示指定batch维度，parser->parseFromFile就解析失败，报ERROR: builtin_op_importers.cpp:695 In function importBatchNormalization: [6] Assertion failed: scale_weights.shape == weights_shape。

同时我也尝试了seralizeEngineFromPythonAPI.py代码，也是在那个地方编译不过。我还没找到问题的本质所在。

dongfangduoshou123 commented 4 years ago

yolo v3.onnx都parse不成功？还没到自主编辑网络这一步呢啊。。。。他那个python的例子也有parse的过程，看看在python下能不能解析成功。确定一下onnx是不是没问题。------------------ 原始邮件 ------------------ 发件人: "Learner0918"notifications@github.com 发送时间: 2020年1月2日(星期四) 晚上6:26 收件人: "dongfangduoshou123/YoloV3-TensorRT"YoloV3-TensorRT@noreply.github.com; 抄送: "dongfangduoshou123"471747996@qq.com;"State change"state_change@noreply.github.com; 主题: Re: [dongfangduoshou123/YoloV3-TensorRT] 编译过程中遇到的几个错误 (#9)

谢谢您！onnx==1.5.0, TensorRT==6.0.1.5,只要我显示指定batch维度，parser->parseFromFile就解析失败，报ERROR: builtin_op_importers.cpp:695 In function importBatchNormalization: [6] Assertion failed: scale_weights.shape == weights_shape。

同时我也尝试了seralizeEngineFromPythonAPI.py代码，也是在那个地方编译不过。我还没找到问题的本质所在。

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub, or unsubscribe.

runrunrun1994 commented 4 years ago

yolov3.onnx parse解析不成功的原因我找到了，原因在于TensorRT6.0.1.5里面的parser版本太低，出现下面这句警告：

WARNING: ONNX model has a newer ir_version (0.0.5) than this parser was built against (0.0.3). 我尝试了一下TensorRT-7.0.0.11，警告没有了，yolov3.onnx文件 parse过了，但我又遇到了新的问题。新问题：doInference的返回结果为空，也就是三个输出层的hostBuffer都为空。我真的是太难了！再请教一下您，TensorRT6.0.1.5您是直接在nvidia官网下的tar包吗？

dongfangduoshou123 commented 4 years ago

警告而已，不崩溃就行，我当时也有警告来着。开源组件是从github上clone的，二进制组建就是根据他那个TensorRT的github上的readme一步一步下下来的。

dongfangduoshou123 commented 4 years ago

能有了吗？大哥。。。

runrunrun1994 commented 4 years ago

大哥，现在能跑起来，但没有检测结果，我还在DEBUG，那几个大写ASSERT很关键。

runrunrun1994 commented 4 years ago

对了，大哥生成trt之后加载会有deserializeCudaEngine::30, condition: (blob) != nullptr错误。

dongfangduoshou123 commented 4 years ago

https://github.com/NVIDIA/TensorRT/issues/178 plugin必须用plugin对应的creator来创建，自己new的话，反序列化就崩溃了，但是就像上面我提的哪个issue说的，我在sampleYoloV3.cpp里已经改为用它的creator来创建了，反系列化就没错了，你这是又出新的幺蛾子了…………？

runrunrun1994 commented 4 years ago

我现在还没关注这个问题，我现在主要的问题是虽然整个demo能跑通了，能显示结果图，但没有检测框。导致这个问题的原因主要是网络没有返回值，最后解码结果时，用的是hostBuffer里面默认初始值0值。又回到最初的起点了，我真是太难了！

runrunrun1994 commented 4 years ago

大哥，我终于跑通了，代码有好几个笔误的地方啊！

dongfangduoshou123 commented 4 years ago

恭喜，那我关了这个issue，你肯定学到不少东西。。。。

dongfangduoshou123 commented 4 years ago

@Learner0918 我又有新的更新传上来了（tensorrt网络的统一接口封装），建议你尝试一下，有bug修复的话，欢迎pull过来。

runrunrun1994 commented 4 years ago

谢谢老哥，我试一试。

wavesCHJ commented 4 years ago

@Learner0918 大哥，我也遇到了这个问题 WARNING: ONNX model has a newer ir_version (0.0.4) than this parser was built against (0.0.3). [01/05/2020-21:39:38] [E] [TRT] (Unnamed Layer* 6) [Convolution]: at least 4 dimensions are required for input While parsing node number 7 [BatchNormalization]: ERROR: builtin_op_importers.cpp:695 In function importBatchNormalization: [6] Assertion failed: scale_weights.shape == weights_shape 用的是sampleOnnxMNIST.cpp的代码，是在运行这行的时候报错的： auto parsed = parser->parseFromFile(locateFile(mParams.onnxFileName, mParams.dataDirs).c_str(), static_cast<int>(gLogger.getReportableSeverity())); 我的TensorRT版本也是6.0，请问您是怎么解决这个问题的啊？是改用TensorRT7.0吗？

runrunrun1994 commented 4 years ago

@wavesCHJ 用TensorRT7.0的确可以解决这个问题，CMakeList.txt好像要改一个地方。

wavesCHJ commented 4 years ago

@Learner0918 你好，因为我这边有些环境不支持7.0，我想问一下，如果要继续用TensorRT6的话，应该怎么解决这个问题啊？

runrunrun1994 commented 4 years ago

@wavesCHJ 参考#1，自己不通过createNetwork设置batch size 大小。

uCedar commented 4 years ago

@Learner0918 你好，“能显示结果图，但没有检测框。导致这个问题的原因主要是网络没有返回值”你遇到的问题我也遇到了，网络输出值都是0，请问你是怎么解决的呢

Scheaven commented 4 years ago

题，CMakeList.t

我用TensorRT 7 报错：无法找到libnvonnxparser_runtime.so ，我将CMake文件中的改文件给注释掉了，但是后边又报错：你有遇到Cuda failure:out of memory这个问题吗？

Input filename: ../model_dump/yolov3.onnx ONNX IR version: 0.0.3 Opset version: 7 Producer name: NVIDIA TensorRT sample Producer version: Domain:
Model version: 0 Doc string:

[03/15/2020-15:32:02] [I] [TRT] Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output. [03/15/2020-15:33:15] [I] [TRT] Detected 1 inputs and 3 output network tensors. jinle buffer:0x3ea265f4 buffer:0x3ea265f8 buffer:0x3ea265fc buffer:0x3ea26600 jinle buffer:0x3ea265f4 buffer:0x3ea265f8 buffer:0x3ea265fc buffer:0x3ea26600 jinle buffer:0x3ea265f4 buffer:0x3ea265f8 buffer:0x3ea265fc buffer:0x3ea26600

[03/15/2020-15:33:32] [W] [TRT] Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles [03/15/2020-15:33:32] [W] [TRT] Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles [03/15/2020-15:33:32] [E] [TRT] ../rtSafe/safeRuntime.cpp (25) - Cuda Error in allocate: 2 (out of memory) [03/15/2020-15:33:32] [E] [TRT] FAILED_ALLOCATION: std::exception Segmentation fault (core dumped)

runrunrun1994 commented 4 years ago

尝试将batchsize设小一点

wangwenbin1991 commented 4 years ago

@Scheaven 遇到了同样的问题，请问你解决了吗？

wangwenbin1991 commented 4 years ago

造成“cuda Error in allocate:2(out of memory)"的一个可能的原因是从yolov3.weights生成yolov3.onnx时，所用的yolov3.cfg文件中的“batch”值还是训练时的值（如64），可以通过减小该值解决。

dongfangduoshou123 / YoloV3-TensorRT