Open luoqianlin opened 1 year ago
Segmentation fault
1)Paddle Lite 版本:v2.12 2)Host 环境:linux armv7hf 3)运行设备环境:爱芯620a 4)预测后端信息:CPU
1)预测 API:C++ 2)预测选项信息:armv7多线程(3线程或4线程时出现段错误) 3)预测库来源:源码编译,命令行参数为./lite/tools/build_linux.sh --arch=armv7hf --with_extra=ON --with_cv=ON
./lite/tools/build_linux.sh --arch=armv7hf --with_extra=ON --with_cv=ON
代码和操作按照paddleocr端侧部署文档提供 执行命令:
./ocr_db_crnn rec models/ch_PP-OCRv3_rec_slim_opt.nb armv7hf INT8 4 1 ../tmp/img_8.jpg models/ppocr_keys_v1.txt models/config.txt 报错为:
./ocr_db_crnn rec models/ch_PP-OCRv3_rec_slim_opt.nb armv7hf INT8 4 1 ../tmp/img_8.jpg models/ppocr_keys_v1.txt models/config.txt
单线程不会出现问题,使用3线程或4线程时很容易出现该问题 运行日志如下:
/ax620a/paddle-lite # ./ocr_db_crnn rec models/ch_PP-OCRv3_rec_slim_opt.nb armv7hf INT8 4 1 ../tmp/img_8.jpg models/ppocr_keys_v1.txt models/config.txt mode: rec [I 1/29 6: 6:24.910 ...ild/paddle-lite/lite/core/device_info.cc:282 get_cpu_arch] Unknow cpu arch: 3079 [I 1/29 6: 6:24.910 ...ild/paddle-lite/lite/core/device_info.cc:282 get_cpu_arch] Unknow cpu arch: 3079 [I 1/29 6: 6:24.910 ...ild/paddle-lite/lite/core/device_info.cc:282 get_cpu_arch] Unknow cpu arch: 3079 [I 1/29 6: 6:24.910 ...ild/paddle-lite/lite/core/device_info.cc:282 get_cpu_arch] Unknow cpu arch: 3079 [I 1/29 6: 6:24.913 ...ild/paddle-lite/lite/core/device_info.cc:1275 Setup] ARM multiprocessors name: MODEL NAME : ARMV7 PROCESSOR REV 5 (V7L) HARDWARE : GENERIC DT BASED SYSTEM [I 1/29 6: 6:24.913 ...ild/paddle-lite/lite/core/device_info.cc:1276 Setup] ARM multiprocessors number: 4 [I 1/29 6: 6:24.913 ...ild/paddle-lite/lite/core/device_info.cc:1278 Setup] ARM multiprocessors ID: 0, max freq: 1248, min freq: 1248, cluster ID: 0, CPU ARCH: A-1 [I 1/29 6: 6:24.913 ...ild/paddle-lite/lite/core/device_info.cc:1278 Setup] ARM multiprocessors ID: 1, max freq: 1248, min freq: 1248, cluster ID: 0, CPU ARCH: A-1 [I 1/29 6: 6:24.913 ...ild/paddle-lite/lite/core/device_info.cc:1278 Setup] ARM multiprocessors ID: 2, max freq: 1248, min freq: 1248, cluster ID: 0, CPU ARCH: A-1 [I 1/29 6: 6:24.913 ...ild/paddle-lite/lite/core/device_info.cc:1278 Setup] ARM multiprocessors ID: 3, max freq: 1248, min freq: 1248, cluster ID: 0, CPU ARCH: A-1 [I 1/29 6: 6:24.913 ...ild/paddle-lite/lite/core/device_info.cc:1284 Setup] L1 DataCache size is: [I 1/29 6: 6:24.914 ...ild/paddle-lite/lite/core/device_info.cc:1286 Setup] 32 KB [I 1/29 6: 6:24.914 ...ild/paddle-lite/lite/core/device_info.cc:1286 Setup] 32 KB [I 1/29 6: 6:24.914 ...ild/paddle-lite/lite/core/device_info.cc:1286 Setup] 32 KB [I 1/29 6: 6:24.914 ...ild/paddle-lite/lite/core/device_info.cc:1286 Setup] 32 KB [I 1/29 6: 6:24.914 ...ild/paddle-lite/lite/core/device_info.cc:1288 Setup] L2 Cache size is: [I 1/29 6: 6:24.914 ...ild/paddle-lite/lite/core/device_info.cc:1290 Setup] 512 KB [I 1/29 6: 6:24.914 ...ild/paddle-lite/lite/core/device_info.cc:1290 Setup] 512 KB [I 1/29 6: 6:24.914 ...ild/paddle-lite/lite/core/device_info.cc:1290 Setup] 512 KB [I 1/29 6: 6:24.914 ...ild/paddle-lite/lite/core/device_info.cc:1290 Setup] 512 KB [I 1/29 6: 6:24.914 ...ild/paddle-lite/lite/core/device_info.cc:1292 Setup] L3 Cache size is: [I 1/29 6: 6:24.914 ...ild/paddle-lite/lite/core/device_info.cc:1294 Setup] 0 KB [I 1/29 6: 6:24.914 ...ild/paddle-lite/lite/core/device_info.cc:1294 Setup] 0 KB [I 1/29 6: 6:24.914 ...ild/paddle-lite/lite/core/device_info.cc:1294 Setup] 0 KB [I 1/29 6: 6:24.914 ...ild/paddle-lite/lite/core/device_info.cc:1294 Setup] 0 KB [I 1/29 6: 6:24.914 ...ild/paddle-lite/lite/core/device_info.cc:1296 Setup] Total memory: 245308KB [I 1/29 6: 6:24.915 ...ild/paddle-lite/lite/core/device_info.cc:1297 Setup] SVE2 support: 0 [I 1/29 6: 6:24.915 ...ild/paddle-lite/lite/core/device_info.cc:1298 Setup] SVE2 f32mm support: 0 [I 1/29 6: 6:24.915 ...ild/paddle-lite/lite/core/device_info.cc:1299 Setup] SVE2 i8mm support: 0 The predict img: ../tmp/img_8.jpg 0 【净含量】:220ml 0.975608 Segmentation fault
重新编译Debug版本,使用Valgrind分析发现有越界的内存写操作
==26664== Invalid write of size 4 ==26664== at 0x484BA54: memset (vg_replace_strmem.c:1374) ==26664== by 0x48AB2D1: void paddle::lite::arm::math::conv_compute_2x2_3x3_int8<signed char>(signed char const*, signed char*, int, int, int, int, int, int, int, short const*, float const*, float const*, paddle::lite::operators::ConvParam const&, paddle::lite::Context<(paddle::lite_api::TargetType)4>*) [clone ._omp_fn.0] [clone .lto_priv.5522] (conv3x3_winograd_int8.cc:227) ==26664== by 0x4D1D275: GOMP_parallel (parallel.c:168) ==26664== by 0x498A853: conv_compute_2x2_3x3_int8 (conv3x3_winograd_int8.cc:173) ==26664== by 0x498A853: paddle::lite::kernels::arm::WinogradConv<(paddle::lite_api::PrecisionType)2, (paddle::lite_api::PrecisionType)1>::Run() (conv_winograd.cc:341) ==26664== by 0x4984F69: paddle::lite::kernels::arm::ConvCompute<(paddle::lite_api::PrecisionType)2, (paddle::lite_api::PrecisionType)1>::Run() (conv_compute.h:39) ==26664== by 0x48FAFBB: Run (program.cc:797) ==26664== by 0x48FAFBB: paddle::lite::RuntimeProgram::Run() (program.cc:610) ==26664== by 0x49D97F7: Run (light_api.h:71) ==26664== by 0x49D97F7: paddle::lite::LightPredictorImpl::Run() (light_api_impl.cc:132) ==26664== by 0x2D40B: RunRecModel(std::vector<std::vector<std::vector<int, std::allocator<int> >, std::allocator<std::vector<int, std::allocator<int> > > >, std::allocator<std::vector<std::vector<int, std::allocator<int> >, std::allocator<std::vector<int, std::allocator<int> > > > > >, cv::Mat, std::shared_ptr<paddle::lite_api::PaddlePredictor>, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >&, std::vector<float, std::allocator<float> >&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, std::shared_ptr<paddle::lite_api::PaddlePredictor>, int, std::vector<double, std::allocator<double> >*, int) (ocr_db_crnn.cc:177) ==26664== by 0x3096F: rec(int, char**) (ocr_db_crnn.cc:588) ==26664== by 0x310A1: main (ocr_db_crnn.cc:627)
增加临时内存的分配(粗暴修改,没有精确考虑算子具体需要多少临时内存),问题得到临时修复
修改代码在这里
可能是 ctx 是单例导致的。这种情况可以尝试用多进程去推理
使用paddleocr的量化后的识别模型,在armv7hf系统下使用cpu多线程推理出现
Segmentation fault
版本、预测库信息:
1)Paddle Lite 版本:v2.12 2)Host 环境:linux armv7hf 3)运行设备环境:爱芯620a 4)预测后端信息:CPU
预测信息
1)预测 API:C++ 2)预测选项信息:armv7多线程(3线程或4线程时出现段错误) 3)预测库来源:源码编译,命令行参数为
./lite/tools/build_linux.sh --arch=armv7hf --with_extra=ON --with_cv=ON
复现信息:
代码和操作按照paddleocr端侧部署文档提供 执行命令:
./ocr_db_crnn rec models/ch_PP-OCRv3_rec_slim_opt.nb armv7hf INT8 4 1 ../tmp/img_8.jpg models/ppocr_keys_v1.txt models/config.txt
报错为:Segmentation fault
问题描述:
单线程不会出现问题,使用3线程或4线程时很容易出现该问题 运行日志如下:
分析
重新编译Debug版本,使用Valgrind分析发现有越界的内存写操作
增加临时内存的分配(粗暴修改,没有精确考虑算子具体需要多少临时内存),问题得到临时修复
修改代码在这里