quantized.out int8量化后的模型在2.5.3版本及之后的CPU后端运行得到的各种错误结果

YingkunZhou commented 1 year ago

首先int8量化在版本2.5.0上运行的还是非常好的，为此我们以2.5.0版本为基准，编译命令为

mkdir build && cd build
cmake .. -D CMAKE_BUILD_TYPE=Release -D CMAKE_INSTALL_PREFIX=../install -D MNN_BUILD_QUANTOOLS=ON -D MNN_BUILD_CONVERTER=ON -D MNN_BUILD_DEMO=ON -D MNN_ARM82=ON

测试设备为 Jetson Orin

为了和后面的量化配置文件配合，建议先打上patch

diff --git a/demo/exec/pictureRecognition.cpp b/demo/exec/pictureRecognition.cpp
index 5022bc6e..8ec83d61 100644
--- a/demo/exec/pictureRecognition.cpp
+++ b/demo/exec/pictureRecognition.cpp
@@ -93,14 +93,14 @@ int main(int argc, const char* argv[]) {
 #endif
         ImageProcess::Config config;
         config.filterType = BILINEAR;
-        float mean[3]     = {103.94f, 116.78f, 123.68f};
-        float normals[3] = {0.017f, 0.017f, 0.017f};
+        float mean[3]     = {123.675,116.28,103.53};
+        float normals[3] = {0.017124753831663668,0.01750700280112045,0.017429193899782137};
         // float mean[3]     = {127.5f, 127.5f, 127.5f};
         // float normals[3] = {0.00785f, 0.00785f, 0.00785f};
         ::memcpy(config.mean, mean, sizeof(mean));
         ::memcpy(config.normal, normals, sizeof(normals));
         config.sourceFormat = RGBA;
-        config.destFormat   = BGR;
+        config.destFormat   = RGB;

         std::shared_ptr<ImageProcess> pretreat(ImageProcess::create(config), ImageProcess::destroy);
         pretreat->setMatrix(trans);

附件mobilenetv3_large_100.tar.gz中有一个mobilenetv3_large_100.onnx，可以通过如下命令进行转化量化

./MNNConvert -f ONNX --modelFile mobilenetv3_large_100.onnx --MNNModel tmp.mnn --bizCode MNN
./quantized.out tmp.mnn mobilenetv3_large_100.mnn preprocessConfig.json

preprocessConfig.json如下所示

{
    "format":"RGB",
    "mean":[123.675,116.28,103.53],
    "normal":[0.017124753831663668,0.01750700280112045,0.017429193899782137],
    "width":224,
    "height":224,
    "path":"imagenet-sample-images/",
    "used_image_num":1000,
    "feature_quantize_method":"KL",
    "weight_quantize_method":"MAX_ABS"
}

这个校准的图片文件夹可以通过git clone https://github.com/nihui/imagenet-sample-images.git 来获得

因为上述操作比较麻烦，所以为了方便起见，附件压缩包中还放了已经转化量化好的模型 mobilenetv3_large_100.mnn

附加压缩包里附带了一张图片进行测试 daisy.jpg, 运行命令如下：

./pictureRecognition.out mobilenetv3_large_100.mnn daisy.jpg

下面让我们看看各个版本的表现：

v2.5.0

```bash Load Cache file error. The device support i8sdot:1, support fp16:1, support i8mm: 0 Session Info: memory use 7.948593 MB, flops is 226.057709 M, backendType is 0, batch size = 1 input: w:224 , h:224, bpp: 3 origin size: 2100, 1500 For Image: daisy.jpg 985, 7.831646 584, 3.793454 738, 3.181606 112, 2.936867 446, 2.814498 533, 2.692128 749, 2.692128 721, 2.569759 883, 2.569759 109, 2.569759 ```

v2.5.1

```bash Load Cache file error. The device support i8sdot:1, support fp16:1, support i8mm: 0 Session Info: memory use 7.948593 MB, flops is 226.057709 M, backendType is 0, batch size = 1 input: w:224 , h:224, bpp: 3 origin size: 2100, 1500 For Image: daisy.jpg 985, 7.831646 584, 3.793454 738, 3.181606 112, 2.936867 446, 2.814498 533, 2.692128 749, 2.692128 721, 2.569759 883, 2.569759 109, 2.569759 ```

v2.5.3

```bash Load Cache file error. The device support i8sdot:1, support fp16:1, support i8mm: 0 Session Info: memory use 7.948593 MB, flops is 226.057709 M, backendType is 0, batch size = 1 input: w:224 , h:224, bpp: 3 origin size: 2100, 1500 For Image: daisy.jpg 390, 15.540923 25, 15.540923 205, 15.540923 252, 15.540923 379, 15.296185 468, 14.684337 269, 14.194860 224, 13.950120 272, 13.950120 779, 13.827751 ```

v2.6.0

```bash Load Cache file error. The device support i8sdot:1, support fp16:1, support i8mm: 0 Session Info: memory use 7.948593 MB, flops is 226.057709 M, backendType is 0, batch size = 1 input: w:224 , h:224, bpp: 3 origin size: 2100, 1500 For Image: daisy.jpg 805, 0.856586 971, 0.856586 606, 0.734217 539, 0.734217 488, 0.611847 813, 0.611847 650, 0.611847 898, 0.611847 704, 0.611847 646, 0.611847 ```

v2.6.3

```bash Load Cache file error. The device support i8sdot:1, support fp16:1, support i8mm: 0 Session Info: memory use 8.728523 MB, flops is 226.057709 M, backendType is 0, batch size = 1 input: w:224 , h:224, bpp: 3 origin size: 2100, 1500 For Image: daisy.jpg 805, 0.856586 971, 0.856586 606, 0.734217 539, 0.734217 488, 0.611847 813, 0.611847 650, 0.611847 898, 0.611847 704, 0.611847 646, 0.611847 ```

latest

```bash Load Cache file error. The device support i8sdot:1, support fp16:1, support i8mm: 0 Session Info: memory use 8.250008 MB, flops is 226.057709 M, backendType is 0, batch size = 1 input: w:224 , h:224, bpp: 3 origin size: 2100, 1500 For Image: daisy.jpg 805, 0.856586 971, 0.856586 606, 0.734217 539, 0.734217 488, 0.611847 813, 0.611847 650, 0.611847 898, 0.611847 704, 0.611847 646, 0.611847 ```

YingkunZhou commented 1 year ago

在latest的master分支上，如果把编译命令换成

cmake -D CMAKE_BUILD_TYPE=Release -D MNN_VULKAN=ON -D MNN_OPENCL=ON .. \
-D CMAKE_INSTALL_PREFIX=../install -D MNN_SEP_BUILD=OFF -D MNN_ARM82=ON -D MNN_BUILD_CONVERTER=ON -D MNN_BUILD_BENCHMARK=ON -D MNN_BUILD_QUANTOOLS=ON -D MNN_BUILD_DEMO=ON

再运行，得到如下结果

$ ./pictureRecognition.out mobilenetv3_large_100.mnn daisy.jpg 
Load Cache file error.
The device support i8sdot:1, support fp16:1, support i8mm: 0
Turn back to cpu
Alloc Image 4 x 1 error, code:-59 
Session Info: memory use 25.484230 MB, flops is 221.308746 M, backendType is 0, batch size = 1
input: w:224 , h:224, bpp: 3
origin size: 2100, 1500
For Image: daisy.jpg
985, 9.938413
308, 2.710280
310, 2.675846
309, 2.487967
883, 2.461747
324, 2.013629
949, 1.993125
326, 1.912460
107, 1.901566
770, 1.891451
Program build log: error: unknown target CPU 'sm_87'
Device Orin failed to build the program

Build program failed, err:-11 ! 
programName.c_str()=s copy_buffer_to_image2d in buildKernel, 511 
CL ERROR CODE : -45, info:getKernel 
[1]    600548 segmentation fault (core dumped)  ./pictureRecognition.out mobilenetv3_large_100.mnn daisy.jpg

好歹结果是对了。。。。。。但这就很神奇啊

jmcc113 commented 1 year ago

遇到同样问题，最新版本结果错误。但我用2.5.0版本推理量化模型会在runSession的时候段错误

v0jiuqi commented 1 year ago

问题已经定位修复

github-actions[bot] commented 9 months ago

Marking as stale. No activity in 60 days.

alibaba / MNN

quantized.out int8量化后的模型在2.5.3版本及之后的CPU后端运行得到的各种错误结果 #2614