Tencent / ncnn

ncnn is a high-performance neural network inference framework optimized for the mobile platform
Other
20.46k stars 4.17k forks source link

FATAL ERROR! unlocked pool allocator get wild #2454

Open Amadeus-AI opened 3 years ago

Amadeus-AI commented 3 years ago

I encountered a really weird question, here's a very easy test:

ncnn::UnlockedPoolAllocator ncnn_blob_pool_allocator_;
ncnn::PoolAllocator ncnn_workspace_pool_allocator_;
ncnn::Net ncnn_detector_;
ncnn::Option opt;
opt.blob_allocator = &ncnn_blob_pool_allocator_;
opt.workspace_allocator = &ncnn_workspace_pool_allocator_;
ncnn_detector_.opt = opt;
ncnn_detector_.load_param("model.param");
ncnn_detector_.load_model("model.bin");

cv::Mat ncnn_cv_mat = cv::imread("./example.bmp", cv::IMREAD_COLOR);
ncnn::Mat in = ncnn::Mat::from_pixels_resize(ncnn_cv_mat.data, ncnn::Mat::PIXEL_BGR, ncnn_cv_mat.cols, ncnn_cv_mat.rows, 128, 128);
const float mean_vals[3] = {104.f, 117.f, 123.f};
in.substract_mean_normalize(mean_vals, 0);
auto ncnn_extractor = ncnn_detector_.create_extractor();
ncnn_extractor.input("main_input", in);
ncnn::Mat out;
ncnn_extractor.extract("143", out);

And here's the model's param:

7767517
40 45
Input                    main_input               0 1 main_input
Convolution              Conv_0                   1 1 main_input 143 0=32 1=3 4=1 5=1 6=864 9=1
Split                    splitncnn_0              1 2 143 143_splitncnn_0 143_splitncnn_1
ConvolutionDepthWise     Conv_3                   1 1 143_splitncnn_1 146 0=32 1=3 3=2 4=1 5=1 6=288 7=32 9=3 -23310=2,0.000000e+00,6.000000e+00
Convolution              Conv_6                   1 1 146 148 0=32 1=1 5=1 6=1024
ConvolutionDepthWise     Conv_8                   1 1 148 151 0=32 1=3 4=1 5=1 6=288 7=32 9=3 -23310=2,0.000000e+00,6.000000e+00
Convolution              Conv_11                  1 1 151 153 0=64 1=1 5=1 6=2048
ConvolutionDepthWise     Conv_13                  1 1 153 156 0=64 1=3 4=1 5=1 6=576 7=64 9=3 -23310=2,0.000000e+00,6.000000e+00
Convolution              Conv_16                  1 1 156 158 0=64 1=1 5=1 6=4096
Pooling                  MaxPool_18               1 1 143_splitncnn_0 159 1=3 2=2 3=1 5=1
Concat                   Concat_19                2 1 158 159 160
Split                    splitncnn_1              1 2 160 160_splitncnn_0 160_splitncnn_1
ConvolutionDepthWise     Conv_20                  1 1 160_splitncnn_1 163 0=96 1=3 3=2 4=1 5=1 6=864 7=96 9=3 -23310=2,0.000000e+00,6.000000e+00
Convolution              Conv_23                  1 1 163 165 0=96 1=1 5=1 6=9216
ConvolutionDepthWise     Conv_25                  1 1 165 168 0=96 1=3 4=1 5=1 6=864 7=96 9=3 -23310=2,0.000000e+00,6.000000e+00
Convolution              Conv_28                  1 1 168 170 0=128 1=1 5=1 6=12288
ConvolutionDepthWise     Conv_30                  1 1 170 173 0=128 1=3 4=1 5=1 6=1152 7=128 9=3 -23310=2,0.000000e+00,6.000000e+00
Convolution              Conv_33                  1 1 173 175 0=128 1=1 5=1 6=16384
Pooling                  MaxPool_35               1 1 160_splitncnn_0 176 1=3 2=2 3=1 5=1
Concat                   Concat_36                2 1 175 176 177
Split                    splitncnn_2              1 2 177 177_splitncnn_0 177_splitncnn_1
ConvolutionDepthWise     Conv_37                  1 1 177_splitncnn_1 180 0=224 1=3 4=1 5=1 6=2016 7=224 9=3 -23310=2,0.000000e+00,6.000000e+00
...

The problem is that: If I try to extect layer before 146 it works fine. If I try to extect layer 146, there will be 1 "FATAL ERROR! unlocked pool allocator get wild" msg but the output is still gotten. If I try to extect layer after 146, there will be 2 "FATAL ERROR! unlocked pool allocator get wild" msg and hang forever.

The test environment is ubuntu 16.04, i7-9700, ncnn built with -DNCNN_VULKAN=OFF, Tag=20201208 p.s. I tried this on win10 and it works fine. p.s.2 I tried test with squeezenet_v1.1 on ubuntu 16.04 and it works fine.

nihui commented 3 years ago

cannot reproduce

I stripped cv::imread so that the Mat in is always valid

#include "net.h"

int main(int argc, char** argv)
{
    ncnn::UnlockedPoolAllocator ncnn_blob_pool_allocator_;
    ncnn::PoolAllocator ncnn_workspace_pool_allocator_;
    ncnn::Net ncnn_detector_;

    ncnn::Option opt;
    opt.blob_allocator = &ncnn_blob_pool_allocator_;
    opt.workspace_allocator = &ncnn_workspace_pool_allocator_;

    ncnn_detector_.opt = opt;
    ncnn_detector_.load_param("model.param");
    ncnn_detector_.load_model("model.bin");

    ncnn::Mat in(128, 128, 3);

    auto ncnn_extractor = ncnn_detector_.create_extractor();
    ncnn_extractor.input("main_input", in);

    ncnn::Mat out;
    ncnn_extractor.extract("class_ret", out);

    return 0;
}
Amadeus-AI commented 3 years ago

Hi, thanks for the reply. libncnn.zip here's my libncnn.a, tag=20201208 Can you test with this? Also I build a libncnn.a with tag=20200916 and it works fine, weird. Forgot to mention that my platform is ubuntu 16 on wsl, maybe is the cause of part of the problem.