alibaba / MNN

MNN is a blazing fast, lightweight deep learning framework, battle-tested by business-critical use cases in Alibaba
http://www.mnn.zone/
8.75k stars 1.67k forks source link

opencl ssd 量化 #2912

Closed morgan322 closed 4 months ago

morgan322 commented 5 months ago

平台(如果交叉编译请再附上交叉编译目标平台):

Platform(Include target platform as well if cross-compiling):

aarch64-linux Device name is: QUALCOMM Adreno(TM) Vendor: QUALCOMM Driver version: OpenCL 3.0 QUALCOMM build: commit #e1ca0183af changeid #I5863f3b3fc Date: 02/13/23 Mon Local Branch: mainline Remote Branch: Compiler E031.41.10.00 OpenCL version: OpenCL 2.0 Adreno(TM) 702 Device type: Unknown

Github版本:

Github Version:2.9.1

编译方式:

Compiling Method

cmake .. -DCMAKE_SYSTEM_VERSION=1 -DCMAKE_SYSTEM_PROCESSOR=aarch64 -DCMAKE_SYSTEM_NAME=Linux -DCMAKE_C_COMPILER=/usr/local /fullstack-debug-x86_64/sysroots/x86_64-qtisdk-linux/usr/bin/aarch64-oe-linux/aarch64-oe-linux-gcc -DCMAKE_CXX_COMPILER=/usr/local/fullstack-debug-x86_64/sysroots/x86_64-qtis dk-linux/usr/bin/aarch64-oe-linux/aarch64-oe-linux-g++ -DCMAKE_SYSROOT=/usr/local/fullstack-debug-x86_64/sysroots/aarch64-oe-linux -DMNN_OPENCL=ON -DMNN_BUILD_TEST=ON -DMNN_L OW_MEMORY=ON -- Use Threadpool, forbid openmp -- >>>>>>>>>>>>> -- MNN BUILD INFO: -- System: Linux -- Processor: aarch64 -- Version: 2.8.4 -- Metal: OFF -- OpenCL: ON -- OpenGL: OFF -- Vulkan: OFF -- ARM82: OFF -- oneDNN: OFF -- TensorRT: OFF -- CoreML: OFF -- NNAPI: OFF -- CUDA: OFF -- OpenMP: OFF -- BF16: OFF -- ThreadPool: ON -- Hidden: TRUE -- Build Path: /home/morgan/ubt/alg/deploy/MNN/build-aarch -- CUDA PROFILE: OFF -- WIN_USE_ASM: -- Enabling AArch64 Assemblies -- Configuring done -- Generating done -- Build files have been written to: /home/morgan/ubt/alg/deploy/MNN/build-aarch (base) morgan@UBT:~/ubt/alg/deploy/MNN/build-aarch$ make -j26 Consolidate compiler generated dependencies of target checkDir.out Consolidate compiler generated dependencies of target checkFile.out Consolidate compiler generated dependencies of target MNNUtils Consolidate compiler generated dependencies of target MNNMath Consolidate compiler generated dependencies of target MNNCV Scanning dependencies of target MNNARM64 Consolidate compiler generated dependencies of target MNNCore Consolidate compiler generated dependencies of target MNNARM64 [ 1%] Built target checkFile.out [ 1%] Built target checkDir.out [ 2%] Built target MNNMath [ 2%] Built target MNNUtils [ 3%] Built target MNNCV [ 19%] Built target MNNARM64 [ 21%] Built target MNNCore Consolidate compiler generated dependencies of target MNNCPU Consolidate compiler generated dependencies of target MNNTransform [ 38%] Built target MNNCPU [ 58%] Built target MNNTransform Consolidate compiler generated dependencies of target MNN [ 58%] Built target MNN Consolidate compiler generated dependencies of target MNN_Express [ 60%] Built target MNN_Express Consolidate compiler generated dependencies of target MNN_CL [ 71%] Built target MNN_CL Consolidate compiler generated dependencies of target ModuleBasic.out Consolidate compiler generated dependencies of target winogradExample.out Consolidate compiler generated dependencies of target checkInvalidValue.out Consolidate compiler generated dependencies of target testTrain.out Consolidate compiler generated dependencies of target SequenceModuleTest.out Consolidate compiler generated dependencies of target mobilenetTest.out Consolidate compiler generated dependencies of target GetMNNInfo Consolidate compiler generated dependencies of target mergeInplaceForCPU Consolidate compiler generated dependencies of target backendTest.out Consolidate compiler generated dependencies of target fuseTest Consolidate compiler generated dependencies of target getPerformance.out Consolidate compiler generated dependencies of target testModel.out Consolidate compiler generated dependencies of target MNNV2Basic.out Consolidate compiler generated dependencies of target testModel_expr.out Consolidate compiler generated dependencies of target testModelWithDescribe.out Consolidate compiler generated dependencies of target timeProfile.out [ 72%] Built target winogradExample.out [ 73%] Built target getPerformance.out [ 73%] Built target ModuleBasic.out [ 73%] Built target checkInvalidValue.out [ 73%] Built target mergeInplaceForCPU [ 73%] Built target mobilenetTest.out [ 73%] Built target GetMNNInfo [ 73%] Built target testModel_expr.out [ 73%] Built target testModel.out [ 73%] Built target backendTest.out [ 74%] Built target SequenceModuleTest.out [ 75%] Built target MNNV2Basic.out [ 75%] Built target testTrain.out [ 75%] Built target fuseTest [ 76%] Built target testModelWithDescribe.out [ 77%] Built target timeProfile.out Consolidate compiler generated dependencies of target run_test.out [100%] Built target run_test.out

请在这里粘贴cmake参数或使用的cmake脚本路径以及完整输出
Paste cmake arguments or path of the build script used here as well as the full log of the cmake proess here or pastebin

编译日志:

Build Log:

粘贴在这里
Paste log here or pastebin
  1. ./MNNConvert -f CAFFE --modelFile /home/morgan/ubt/alg/cv/export/caffetoncnn/MobileNetSSD_new.caffemodel --prototxt /home/morgan/ubt/alg/cv/export/caffetoncnn/MobileNetSSD_new.prototxt --MNNModel ./model/mobilenetssd.mnn --bizCode biz --weightQuantBits=8 ./quantized.out ./model/mobilenetssd.mnn ./model/mobilenetssd_quant.mnn ./model/mobilnet_quant.json量化后的ssd模型检测不到任何目标 { "format":"RGB", "mean":[ 127.5, 127.5, 127.5 ], "normal":[ 0.007843, 0.007843, 0.007843 ], "width":300, "height":300, "path":"/home/morgan/ubt/data/img/", "used_image_num":1014, "feature_quantize_method":"KL", "weight_quantize_method":"MAX_ABS", "model":"mobilenetssd.mnn" } 2.高通GPU可以跑基准测试 直接--weightQuantBits=8量化模型 编译的模型 mobilenetssdsess = mobilenetssdinterpreter->createSession(schedule_config);这里报段错误

    include "mobilenetssd.h"

    include

    include

include "opencv2/imgproc.hpp"

namespace mirror { MobilenetSSD::MobilenetSSD() { initialized_ = false; }

MobilenetSSD::~MobilenetSSD() { mobilenetssdinterpreter->releaseModel(); mobilenetssdinterpreter->releaseSession(mobilenetssdsess); }

int MobilenetSSD::Init(const char * root_path) {

std::string model_file = std::string(root_path) + "/mobilenetssd.mnn";
std::cout << "start Init." <<model_file<< std::endl;
mobilenetssd_interpreter_ = std::unique_ptr<MNN::Interpreter>(MNN::Interpreter::createFromFile(model_file.c_str()));
if (nullptr == mobilenetssd_interpreter_) {
    std::cout << "load model failed." << std::endl;
    return 10000;
}

MNN::ScheduleConfig schedule_config;
schedule_config.type = MNN_FORWARD_OPENCL;
schedule_config.numThread = 1;

MNN::BackendConfig backend_config;
backend_config.precision = MNN::BackendConfig::Precision_Low;
backend_config.power = MNN::BackendConfig::Power_Normal;
backend_config.memory = MNN::BackendConfig::Memory_Low;
schedule_config.backendConfig = &backend_config;

mobilenetssd_sess_ = mobilenetssd_interpreter_->createSession(schedule_config);

// image processer
MNN::CV::Matrix trans;
trans.setScale(1.0f, 1.0f);
MNN::CV::ImageProcess::Config img_config;
img_config.filterType = MNN::CV::BICUBIC;
::memcpy(img_config.mean, meanVals_, sizeof(meanVals_));
::memcpy(img_config.normal, normVals_, sizeof(normVals_));
img_config.sourceFormat = MNN::CV::BGR;
img_config.destFormat = MNN::CV::RGB;
pretreat_data_ = std::shared_ptr<MNN::CV::ImageProcess>(MNN::CV::ImageProcess::create(img_config));
pretreat_data_->setMatrix(trans);
std::cout << "=================================2" << std::endl;
std::string input_name = "data";
input_tensor_ = mobilenetssd_interpreter_->getSessionInput(mobilenetssd_sess_, input_name.c_str());
mobilenetssd_interpreter_->resizeTensor(input_tensor_, dims_);
mobilenetssd_interpreter_->resizeSession(mobilenetssd_sess_);
std::cout << "=================================3" << std::endl;
initialized_ = true;

std::cout << "end Init." << std::endl;
return 0;

}

int MobilenetSSD::DetectObject(const cv::Mat & imgsrc, std::vector* objects) { std::cout << "start detect." << std::endl; if (!initialized) { std::cout << "model uninitialized." << std::endl; return 10000; } if (img_src.empty()) { std::cout << "input empty." << std::endl; return 10001; }

int width = img_src.cols;
int height = img_src.rows;

// preprocess
cv::Mat img_resized;
cv::resize(img_src, img_resized, inputSize_);
pretreat_data_->convert(img_resized.data, inputSize_.width, inputSize_.height, 0, input_tensor_);

mobilenetssd_interpreter_->runSession(mobilenetssd_sess_);
std::string output_name = "detection_out";
MNN::Tensor* output_tensor = mobilenetssd_interpreter_->getSessionOutput(mobilenetssd_sess_, output_name.c_str());

// copy to host
MNN::Tensor output_host(output_tensor, output_tensor->getDimensionType());
output_tensor->copyToHostTensor(&output_host);

auto output_ptr = output_host.host<float>();
std::vector<ObjectInfo> objects_tmp;
for (int i = 0; i < output_host.height(); ++i) {
    int index = i * output_host.width();
    ObjectInfo object;
    object.name_ = class_names[int(output_ptr[index + 0])];
    object.score_ = output_ptr[index + 1];
    object.location_.x = output_ptr[index + 2] * width;
    object.location_.y = output_ptr[index + 3] * height;
    object.location_.width = output_ptr[index + 4] * width - object.location_.x;
    object.location_.height = output_ptr[index + 5] * height - object.location_.y;

    objects_tmp.push_back(object);
}
NMS(objects_tmp, objects, nmsThreshold_);

std::cout << "end detect." << std::endl;

return 0;

}

} 测demo以及上面代码都不行 ./object start init:../../data/models start Init.../../data/models/mobilenetssd.mnn The device support i8sdot:0, support fp16:0, support i8mm: 0 Segmentation fault (core dumped)

jxt1234 commented 5 months ago
  1. 原始模型用 OpenCL 加载是否有问题
  2. 增加 --weightQuantBits=8 之后转出的模型发一下看看
morgan322 commented 5 months ago

1.原始模式opencl加载以及推理结果无问题,quantized转的int8模型会自动跳转到cpu推理且结果有问题(检测不到任何目标);--weightQuantBits=8的模型,重新编译动态库检测无问题,只有内存消耗的减少,却没有推理帧率的增加;2.下面是quantized量化的模型 --weightQuantBits=8 之后转出的模型重新编译后无问题 [Uploading mobilenetssd.zip…]()

morgan322 commented 5 months ago

1.还有什么加速功能吗 2.设两个线程为啥启动那么慢 需要等20分钟