Tencent / ncnn

ncnn is a high-performance neural network inference framework optimized for the mobile platform
Other
20.42k stars 4.16k forks source link

onnx2ncnn生成的模型使用benchncnn运行报错Segmentation fault (core dumped) #3123

Closed Z-Xiong closed 3 years ago

Z-Xiong commented 3 years ago

error log | 日志或报错信息 | ログ

$ ./benchncnn 1 4 0 -1 0
loop_count = 1
num_threads = 4
powersave = 0
gpu_device = -1
cooling_down = 0
          squeezenet  min =   18.57  max =   18.57  avg =   18.57
     squeezenet_int8  min =   41.16  max =   41.16  avg =   41.16
           mobilenet  min =   12.24  max =   12.24  avg =   12.24
      mobilenet_int8  min =   43.14  max =   43.14  avg =   43.14
        mobilenet_v2  min =   11.51  max =   11.51  avg =   11.51
        mobilenet_v3  min =   10.23  max =   10.23  avg =   10.23
          shufflenet  min =   14.14  max =   14.14  avg =   14.14
       shufflenet_v2  min =   11.68  max =   11.68  avg =   11.68
             mnasnet  min =   16.43  max =   16.43  avg =   16.43
     proxylessnasnet  min =   12.81  max =   12.81  avg =   12.81
     efficientnet_b0  min =   15.11  max =   15.11  avg =   15.11
   efficientnetv2_b0  min =   19.95  max =   19.95  avg =   19.95
        regnety_400m  min =   17.01  max =   17.01  avg =   17.01
           blazeface  min =    4.39  max =    4.39  avg =    4.39
           googlenet  min =   34.47  max =   34.47  avg =   34.47
      googlenet_int8  min =  108.80  max =  108.80  avg =  108.80
            resnet18  min =   45.26  max =   45.26  avg =   45.26
       resnet18_int8  min =  118.54  max =  118.54  avg =  118.54
             alexnet  min =   27.15  max =   27.15  avg =   27.15
               vgg16  min =  116.41  max =  116.41  avg =  116.41
          vgg16_int8  min =  784.78  max =  784.78  avg =  784.78
            resnet50  min =   74.98  max =   74.98  avg =   74.98
       resnet50_int8  min =  213.97  max =  213.97  avg =  213.97
      squeezenet_ssd  min =   61.52  max =   61.52  avg =   61.52
 squeezenet_ssd_int8  min =   95.39  max =   95.39  avg =   95.39
       mobilenet_ssd  min =   24.16  max =   24.16  avg =   24.16
  mobilenet_ssd_int8  min =   83.76  max =   83.76  avg =   83.76
      mobilenet_yolo  min =   60.65  max =   60.65  avg =   60.65
  mobilenetv2_yolov3  min =   38.21  max =   38.21  avg =   38.21
         yolov4-tiny  min =   59.13  max =   59.13  avg =   59.13
           nanodet_m  min =   33.79  max =   33.79  avg =   33.79
Segmentation fault (core dumped)

注:nanaodet_m后是我添加的自己的param文件,Segmentation fault (core dumped)是运行它时出现的

(如果打开BUILD_BENCHMACH编译选项,运行的错误日志如下)

$ ./benchncnn 1 4 0 -1 0
loop_count = 1
num_threads = 4
powersave = 0
gpu_device = -1
cooling_down = 0
Convolution              Conv_0                            44.17ms    |          [960, 540 *1] -> [480, 270,  64 *1]         kernel: 3 x 3     stride: 2 x 2
ReLU                     Relu_2                             2.01ms    |     [480, 270,  64 *1] -> [480, 270,  64 *1]    
Convolution              Conv_3                            78.31ms    |     [480, 270,  64 *1] -> [480, 270,   8 *8]         kernel: 1 x 1     stride: 1 x 1
ReLU                     Relu_5                            87.98ms    |     [480, 270,   8 *8] -> [480, 270,  64 *1]    
Split                    splitncnn_0                        0.00ms    |
Convolution              Conv_6                            30.25ms    |     [480, 270,  64 *1] -> [240, 135,   8 *8]         kernel: 3 x 3     stride: 2 x 2
ReLU                     Relu_8                            12.08ms    |     [240, 135,   8 *8] -> [240, 135,  64 *1]    
Convolution              Conv_9                            17.30ms    |     [240, 135,  64 *1] -> [240, 135,   8 *8]         kernel: 3 x 3     stride: 1 x 1
Convolution              Conv_11                           16.68ms    |     [480, 270,  64 *1] -> [240, 135,   8 *8]         kernel: 1 x 1     stride: 2 x 2
BinaryOp                 Add_13                            20.06ms    |
ReLU                     Relu_14                            0.41ms    |     [240, 135,  64 *1] -> [240, 135,  64 *1]    
Split                    splitncnn_1                        0.01ms    |
Convolution              Conv_15                           14.97ms    |     [240, 135,  64 *1] -> [240, 135,   8 *8]         kernel: 3 x 3     stride: 1 x 1
ReLU                     Relu_17                            6.43ms    |     [240, 135,   8 *8] -> [240, 135,  64 *1]    
Convolution              Conv_18                           14.97ms    |     [240, 135,  64 *1] -> [240, 135,   8 *8]         kernel: 3 x 3     stride: 1 x 1
BinaryOp                 Add_20                            11.07ms    |
ReLU                     Relu_21                            0.42ms    |     [240, 135,  64 *1] -> [240, 135,  64 *1]    
Split                    splitncnn_2                        0.00ms    |
Convolution              Conv_22                           14.87ms    |     [240, 135,  64 *1] -> [240, 135,   8 *8]         kernel: 3 x 3     stride: 1 x 1
ReLU                     Relu_24                            7.63ms    |     [240, 135,   8 *8] -> [240, 135,  64 *1]    
Convolution              Conv_25                           15.23ms    |     [240, 135,  64 *1] -> [240, 135,   8 *8]         kernel: 3 x 3     stride: 1 x 1
BinaryOp                 Add_27                            11.25ms    |
ReLU                     Relu_28                            0.46ms    |     [240, 135,  64 *1] -> [240, 135,  64 *1]    
Split                    splitncnn_3                        0.00ms    |
Convolution              Conv_72                           23.51ms    |     [240, 135,  64 *1] -> [240, 135,  16 *8]         kernel: 1 x 1     stride: 1 x 1
ReLU                     Relu_74                           42.03ms    |     [240, 135,  16 *8] -> [240, 135, 128 *1]    
Convolution              Conv_87                           55.30ms    |     [240, 135, 128 *1] -> [240, 135,  16 *8]         kernel: 1 x 1     stride: 1 x 1
GroupNorm                Add_96                            18.31ms    |     [240, 135,  16 *8] -> [240, 135, 128 *1]    
ReLU                     Relu_97                            0.91ms    |     [240, 135, 128 *1] -> [240, 135, 128 *1]    
Convolution              Conv_98                           53.97ms    |     [240, 135, 128 *1] -> [240, 135,  16 *8]         kernel: 1 x 1     stride: 1 x 1
GroupNorm                Add_107                           17.63ms    |     [240, 135,  16 *8] -> [240, 135, 128 *1]    
ReLU                     Relu_108                           0.88ms    |     [240, 135, 128 *1] -> [240, 135, 128 *1]    
Split                    splitncnn_8                        0.00ms    |
Segmentation fault (core dumped)

model | 模型 | モデル

  1. original model https://gitee.com/Shawn-Xiong/model/blob/master/lfd.onnx
  2. onnx-sim model https://gitee.com/Shawn-Xiong/model/blob/master/lfd-sim.onnx
  3. ncnn param model https://gitee.com/Shawn-Xiong/model/blob/master/lfd-sim.param
  4. ncnn bin model https://gitee.com/Shawn-Xiong/model/blob/master/lfd-sim.bin

how to reproduce | 复现步骤 | 再現方法

  1. 生成onnx模型

    dummy_input = torch.randn(1, 3, 540, 960)
    input_names = ["input"]
    output_names = ["output.1"]+["output.2"] + ["output.3"] + ["output.4"] + ["output.5"] + ["output.6"]+["output.7"] + ["output.8"] + ["output.9"] + ["output.10"]
    # fix shape
    torch.onnx.export(model, dummy_input, "./lfd.onnx", verbose=True, input_names=input_names,
                      output_names=output_names)
  2. 简化onnx模型

    model_simp, check = simplify('./lfd.onnx')
    assert check, "Simplified ONNX model could not be validated"
    onnx.save_model(model_simp, './lfd-sim.onnx')
  3. 使用onnx2ncnn生成模型 ./onnx2ncnn lfd-sim.onnx lfd-sim.param lfd-sim.bin

  4. 在benchncnn中添加,并把param放入benchncnn所在文件夹下 benchmark("lfd-sim", ncnn::Mat(960, 540), opt);

Z-Xiong commented 3 years ago

已经解决,原因把: benchmark("lfd-sim", ncnn::Mat(416, 416, 3), opt); 写为了: benchmark("lfd-sim", ncnn::Mat(416, 416), opt); 少了一个维度导致的。

nihui commented 3 months ago

针对onnx模型转换的各种问题,推荐使用最新的pnnx工具转换到ncnn In view of various problems in onnx model conversion, it is recommended to use the latest pnnx tool to convert your model to ncnn

pip install pnnx
pnnx model.onnx inputshape=[1,3,224,224]

详细参考文档 Detailed reference documentation https://github.com/pnnx/pnnx https://github.com/Tencent/ncnn/wiki/use-ncnn-with-pytorch-or-onnx#how-to-use-pnnx