Tencent / ncnn

ncnn is a high-performance neural network inference framework optimized for the mobile platform
Other
20.33k stars 4.15k forks source link

The model's output are a lot of -nan(ind) #3682

Open wwdok opened 2 years ago

wwdok commented 2 years ago

error log | 日志或报错信息 | ログ

Gather not supported yet!
# axis=1
Gather not supported yet!
# axis=1
Gather not supported yet!
# axis=1
Cast not supported yet!
# to=6
Cast not supported yet!
# to=1
Cast not supported yet!
# to=6
Cast not supported yet!
# to=1
Cast not supported yet!
# to=6
Cast not supported yet!
# to=7
GatherND not supported yet!
Cast not supported yet!
# to=1
Cast not supported yet!
# to=6
Cast not supported yet!
# to=7
GatherND not supported yet!
Cast not supported yet!
# to=7
GatherND not supported yet!
Cast not supported yet!
# to=7
GatherND not supported yet!
Cast not supported yet!
# to=6
Cast not supported yet!
# to=1
Cast not supported yet!
# to=6
Cast not supported yet!
# to=1
Cast not supported yet!
# to=6
Cast not supported yet!
# to=7
GatherND not supported yet!
Cast not supported yet!
# to=1
Cast not supported yet!
# to=6
Cast not supported yet!
# to=7
GatherND not supported yet!
Cast not supported yet!
# to=7
GatherND not supported yet!
Cast not supported yet!
# to=7
GatherND not supported yet!
Cast not supported yet!
# to=6
Cast not supported yet!
# to=1
Cast not supported yet!
# to=6
Cast not supported yet!
# to=1
Cast not supported yet!
# to=6
Cast not supported yet!
# to=7
GatherND not supported yet!
Cast not supported yet!
# to=1
Cast not supported yet!
# to=6
Cast not supported yet!
# to=7
GatherND not supported yet!
Cast not supported yet!
# to=7
GatherND not supported yet!
Cast not supported yet!
# to=7
GatherND not supported yet!

model | 模型 | モデル

~1. original model https://github.com/PINTO0309/PINTO_model_zoo/files/8357832/model_float32.onnx.zip

  1. converted param and bin model_float32.param.zip model_float32.bin.zip~

how to reproduce | 复现步骤 | 再現方法

~I upload model to convertmodel.com, it reports those errors. I also try onnx-simplifier, but it still can not simplify any of these unsupported ops. I download converted param and bin files, try to modify the param file according this article《手工优化ncnn模型结构》, but this model is more complicated, it is hard for me to manually modify it correctly. I use netron and Ctrl F, search out some of related ops in converted param file:~ image image

~For the first graph, I modify: from~

Gather           GatherV2_2               2 1 strided_slice:0_splitncnn_2 GatherV2_2/indices:0 GatherV2_2:0
Gather           GatherV2_1               2 1 strided_slice:0_splitncnn_1 GatherV2_1/indices:0 GatherV2_1:0
Gather           GatherV2                 2 1 strided_slice:0_splitncnn_0 GatherV2/indices:0 GatherV2:0
Gemm             MatMul_1                 2 1 GatherV2:0 transpose:0 MatMul_1:0
Gemm             MatMul_4                 2 1 GatherV2_1:0 transpose_2:0 MatMul_4:0
Gemm             MatMul_7                 2 1 GatherV2_2:0 transpose_4:0 MatMul_7:0

to

Gemm             MatMul_1                 2 1 strided_slice:0_splitncnn_0 transpose:0 MatMul_1:0
Gemm             MatMul_4                 2 1 strided_slice:0_splitncnn_1 transpose_2:0 MatMul_4:0
Gemm             MatMul_7                 2 1 strided_slice:0_splitncnn_2 transpose_4:0 MatMul_7:0

~I don't know if it is correct. For the second graph, I don't know how to modify the GatherND. As far as I understand, to modify the param, we need to find the input node and output node, then merge out those intermediate ops, but in this case, it need other input(as the yellow indicate), so is it possible to merge them ? Following graph is corresponding part of onnx model:~

image

wwdok commented 2 years ago

I replaced tf.gather_nd with tf.gather and some other tf ops, and try to replace tf.gather with some other tf ops, then reexport the onnx model, but the model size become double size larger, so please consider support gather op in the future (gather_nd can be implemented based on gather), we need a op that can select element by index, this is a common op in tf and pytorch . I quit implementing tf.gather with other tf ops, swith to implement it in ncnn custom layer : This is the onnx that have tf.gather but not have tf.gather_nd : model_float32_wo_gathernd_onnx.zip This is the converted param and bin: model_float32_wo_gathernd_param.zip model_float32_wo_gathernd_bin.zip This is the cumtom gather op:

#include "gather.h"

namespace ncnn {

Gather::Gather()
{
    one_blob_only = true;
}

int Gather::forward(const std::vector<Mat>& bottom_blobs, std::vector<Mat>& top_blobs, const Option& opt) const
{
    const Mat& params = bottom_blobs[0];
    const Mat& indices = bottom_blobs[1];
    int indices_rank = indices.dims;

    if(indices_rank == 1){
        Mat& top_blob = top_blobs[0];
        top_blob.create(indices.w, params.h, params.elemsize, opt.blob_allocator);
        if (top_blob.empty())
            return -100;
        // index params by indices, then store in top_blob
        for(int i = 0; i < indices.w; i++){
            // get index from indices at i
            int index = indices[i];
            // copy one value each time
            memcpy(top_blob.row(i), params.row(index), params.elemsize*params.h);
        }
    }

    if (indices_rank == 2)
    {
        Mat& top_blob = top_blobs[0];
        top_blob.create(indices.w, indices.h, params.h, params.elemsize, opt.blob_allocator);
        if (top_blob.empty())
            return -100;
        // index params by indices, then store in top_blob
        #pragma omp parallel for num_threads(opt.num_threads)
        for (int i = 0; i < indices.h; i++)
        {
            for (int j = 0; j < indices.w; j++)
            {
                // get index from indices at i,j
                int index = indices[i*indices.h+j];
                float* absolute_index = (float*)top_blob.data + i*indices.h+j;
                memcpy(absolute_index, params.row(index), params.elemsize*params.h);
            }
        }
    }
    return 0;
}

DEFINE_LAYER_CREATOR(Gather)

} // namespace ncnn
#pragma once
#include "layer.h"

namespace ncnn{

class Gather : public Layer
{
public:
    Gather();

    virtual int forward(const std::vector<Mat>& bottom_blobs, std::vector<Mat>& top_blobs, const Option& opt) const;

};

} // namespace ncnn

Then I use followig code to test it:

#include "ncnn/net.h"
#include <opencv2/core/core.hpp>
#include <opencv2/highgui/highgui.hpp>
#include <opencv2/imgproc/imgproc.hpp>

int main(int argc, char** argv)
{
    const char* imagepath = "C:/Users/wadewang/Pictures/face.bmp";

    cv::Mat img = cv::imread(imagepath, 1);
    if (img.empty())
    {
        fprintf(stderr, "cv::imread %s failed/n", imagepath);
        return -1;
    }
    cv::imshow("IMAGE",img);
    cv::waitKey(0);
    ncnn::Net face_mesh_net;
    face_mesh_net.load_param("E:/Repo/ncnn/ncnn/build/tools/onnx/model_float32_wo_gathernd.param");
    face_mesh_net.load_model("E:/Repo/ncnn/ncnn/build/tools/onnx/model_float32_wo_gathernd.bin");

    const int target_size = 192;

    int img_w = img.cols;
    int img_h = img.rows;

    ncnn::Mat in = ncnn::Mat::from_pixels_resize(img.data, ncnn::Mat::PIXEL_BGR2RGB, img_w, img_h, target_size, target_size);

    const float mean_vals[3] = {0.f, 0.f, 0.f};
    const float norm_vals[3] = {1.0 / 255.0, 1.0 / 255.0, 1.0 / 255.0};

    in.substract_mean_normalize(mean_vals, norm_vals);

    ncnn::Extractor ex = face_mesh_net.create_extractor();

    ex.input("input_1", in);

    ncnn::Mat out;
    ex.extract("output_mesh_identity", out);

    // print values in mat
    int width = out.w;
    int height = out.h;
    int channels = out.c;
    printf("whc shape:(%d, %d, %d) \n", width, height, channels);

    for (int c = 0; c < out.c; c++)
    {
        for (int i = 0; i < out.h; i++)
        {
            const float* values = out.row(i);
            for (int j = 0; j < out.w; j++)
            {
                printf("in channel %d height %d widdth %d: %f \n", c,i,j, values[j]);
            }
        }
    }

    return 0;
}

It output a lot of -nan(ind): image How to debug this problem ? BTW, I think it may not the problem of custom layer - gather, because the gather layer is after the exteacted blob output_mesh_identity.

wwdok commented 2 years ago

I tried to extract immediate blob output, find that output values generated by these blobs become larger and larger as the tensor flow goes on, at blob p_re_lu_6/Alpha_dequantize_prelu/add:0, its value has been very large, at blob p_re_lu_7/Alpha_dequantize_prelu/add:0, its velues directly become -nan(ind). I am wondering if the model weights is wrong during model conversion. I use onnx2ncnn model_float32_wo_gathernd.onnx model_float32_wo_gathernd.param model_float32_wo_gathernd.bin to conver model, the log is :

Unsupported squeeze axes !
Gather not supported yet!
  # axis=1
Gather not supported yet!
  # axis=1
Gather not supported yet!
  # axis=1
Cast not supported yet!
  # to=6
Cast not supported yet!
  # to=1
Cast not supported yet!
  # to=6
Cast not supported yet!
  # to=1
Cast not supported yet!
  # to=6
Unsupported squeeze axes !
Gather not supported yet!
  # axis=0
Unsupported unsqueeze axes !
Cast not supported yet!
  # to=1
Cast not supported yet!
  # to=6
Unsupported squeeze axes !
Gather not supported yet!
  # axis=0
Unsupported unsqueeze axes !
Unsupported squeeze axes !
Gather not supported yet!
  # axis=0
Unsupported unsqueeze axes !
Unsupported squeeze axes !
Gather not supported yet!
  # axis=0
Unsupported unsqueeze axes !
Cast not supported yet!
  # to=6
Cast not supported yet!
  # to=1
Cast not supported yet!
  # to=6
Cast not supported yet!
  # to=1
Cast not supported yet!
  # to=6
Unsupported squeeze axes !
Gather not supported yet!
  # axis=0
Unsupported unsqueeze axes !
Cast not supported yet!
  # to=1
Cast not supported yet!
  # to=6
Unsupported squeeze axes !
Gather not supported yet!
  # axis=0
Unsupported unsqueeze axes !
Unsupported squeeze axes !
Gather not supported yet!
  # axis=0
Unsupported unsqueeze axes !
Unsupported squeeze axes !
Gather not supported yet!
  # axis=0
Unsupported unsqueeze axes !
Cast not supported yet!
  # to=6
Cast not supported yet!
  # to=1
Cast not supported yet!
  # to=6
Cast not supported yet!
  # to=1
Cast not supported yet!
  # to=6
Unsupported squeeze axes !
Gather not supported yet!
  # axis=0
Unsupported unsqueeze axes !
Cast not supported yet!
  # to=1
Cast not supported yet!
  # to=6
Unsupported squeeze axes !
Gather not supported yet!
  # axis=0
Unsupported unsqueeze axes !
Unsupported squeeze axes !
Gather not supported yet!
  # axis=0
Unsupported unsqueeze axes !
Unsupported squeeze axes !
Gather not supported yet!
  # axis=0
Unsupported unsqueeze axes !

Above errorous ops are loacated at the end part of model(i.e. far after p_re_lu_6/Alpha_dequantize_prelu/add:0), i think it may not be their problems

wwdok commented 2 years ago

I try to compare the output from onnx and ncnn model to find their difference, then i find at the beginning - the first conv, their weights are the same, just the hyper-parameters type are different: image Some output of first conv of onnx model :

in channel 0 height 0 widdth 0: 0.034148194      
in channel 0 height 0 widdth 1: 0.07430725       
in channel 0 height 0 widdth 2: 0.024422623      
in channel 0 height 0 widdth 3: 0.014723588      
in channel 0 height 0 widdth 4: 0.0157197        
in channel 0 height 0 widdth 5: -0.0047287494    
in channel 0 height 0 widdth 6: -0.010363534     
in channel 0 height 0 widdth 7: 0.008031346      
in channel 0 height 0 widdth 8: -0.020062402     
in channel 0 height 0 widdth 9: -0.019660018     
in channel 0 height 0 widdth 10: -0.020062402    
in channel 0 height 0 widdth 11: -0.019066527    
in channel 0 height 0 widdth 12: -0.017458156    
in channel 0 height 0 widdth 13: -0.019066527    
in channel 0 height 0 widdth 14: -0.017458156    
in channel 0 height 0 widdth 15: -0.019066527    
in channel 0 height 0 widdth 16: -0.019660018    
in channel 0 height 0 widdth 17: -0.017485023    
in channel 0 height 0 widdth 18: -0.020435594    
in channel 0 height 0 widdth 19: -0.026657153    
in channel 0 height 0 widdth 20: -0.025257278    
in channel 0 height 0 widdth 21: -0.020828392    
in channel 0 height 0 widdth 22: -0.01854949     
in channel 0 height 0 widdth 23: 0.012343921     
in channel 0 height 0 widdth 24: -0.001249671    
in channel 0 height 0 widdth 25: 0.023160793     
in channel 0 height 0 widdth 26: 0.060877133     
in channel 0 height 0 widdth 27: 0.08551507      
in channel 0 height 0 widdth 28: 0.028380718     
in channel 0 height 0 widdth 29: -0.019005813
in channel 0 height 0 widdth 30: -0.01799129
in channel 0 height 0 widdth 31: -0.021619853
in channel 0 height 0 widdth 32: -0.0015893579
in channel 0 height 0 widdth 33: -0.018605076
in channel 0 height 0 widdth 34: 0.04062692
in channel 0 height 0 widdth 35: 0.0072576404
in channel 0 height 0 widdth 36: -0.03516619
in channel 0 height 0 widdth 37: -0.059269324
in channel 0 height 0 widdth 38: -0.45865953
in channel 0 height 0 widdth 39: -1.0048343
in channel 0 height 0 widdth 40: -1.6661792
in channel 0 height 0 widdth 41: -1.1199121
in channel 0 height 0 widdth 42: 0.30682027
in channel 0 height 0 widdth 43: -0.046149798
in channel 0 height 0 widdth 44: -1.1461163
in channel 0 height 0 widdth 45: -0.29640478
in channel 0 height 0 widdth 46: -0.12440653
in channel 0 height 0 widdth 47: 0.6224393
in channel 0 height 0 widdth 48: 0.13229021
in channel 0 height 0 widdth 49: 1.080451
in channel 0 height 0 widdth 50: -0.89622986
in channel 0 height 0 widdth 51: -0.3228532
in channel 0 height 0 widdth 52: -0.37348163
in channel 0 height 0 widdth 53: -1.4292943
in channel 0 height 0 widdth 54: -1.287227
in channel 0 height 0 widdth 55: -0.68879217
in channel 0 height 0 widdth 56: -0.684978
in channel 0 height 0 widdth 57: -0.9808767
in channel 0 height 0 widdth 58: -0.1821686
in channel 0 height 0 widdth 59: -0.06908692
in channel 0 height 0 widdth 60: 0.011664763
in channel 0 height 0 widdth 61: 0.013435714
in channel 0 height 0 widdth 62: 0.07098691
in channel 0 height 0 widdth 63: 0.0645245
in channel 0 height 0 widdth 64: 0.013408773
in channel 0 height 0 widdth 65: 0.017631285
in channel 0 height 0 widdth 66: -0.012165479
in channel 0 height 0 widdth 67: -0.02030319
in channel 0 height 0 widdth 68: 0.0047903582
in channel 0 height 0 widdth 69: 0.016995054
in channel 0 height 0 widdth 70: 0.018802576
in channel 0 height 0 widdth 71: 0.022641402
in channel 0 height 0 widdth 72: 0.01704412
in channel 0 height 0 widdth 73: 0.014813688
in channel 0 height 0 widdth 74: -0.0102753565
in channel 0 height 0 widdth 75: -0.03150826
in channel 0 height 0 widdth 76: 0.01756763
in channel 0 height 0 widdth 77: 0.018047996
in channel 0 height 0 widdth 78: 0.02101862
in channel 0 height 0 widdth 79: 0.016704459
in channel 0 height 0 widdth 80: 0.01691069
in channel 0 height 0 widdth 81: 0.020127434
in channel 0 height 0 widdth 82: 0.01691069
in channel 0 height 0 widdth 83: 0.01792533
in channel 0 height 0 widdth 84: 0.015633244
in channel 0 height 0 widdth 85: 0.011424061
in channel 0 height 0 widdth 86: 0.018460922
in channel 0 height 0 widdth 87: 0.029403381
in channel 0 height 0 widdth 88: 0.018282585
in channel 0 height 0 widdth 89: 0.013438109
in channel 0 height 0 widdth 90: 0.018568955
in channel 0 height 0 widdth 91: 0.015738994
in channel 0 height 0 widdth 92: 0.017000936
in channel 0 height 0 widdth 93: 0.01659903
in channel 0 height 0 widdth 94: 0.016141139
in channel 0 height 0 widdth 95: -0.34014928

Some output of first conv of ncnn model :

in channel 0 height 0 widdth 0: 0.036087 
in channel 0 height 0 widdth 1: 0.076150
in channel 0 height 0 widdth 2: 0.026361
in channel 0 height 0 widdth 3: 0.016567
in channel 0 height 0 widdth 4: 0.016753
in channel 0 height 0 widdth 5: -0.005584
in channel 0 height 0 widdth 6: -0.009331
in channel 0 height 0 widdth 7: 0.008082
in channel 0 height 0 widdth 8: -0.018219
in channel 0 height 0 widdth 9: -0.017721
in channel 0 height 0 widdth 10: -0.018219
in channel 0 height 0 widdth 11: -0.018033
in channel 0 height 0 widdth 12: -0.018313
in channel 0 height 0 widdth 13: -0.018033
in channel 0 height 0 widdth 14: -0.018313
in channel 0 height 0 widdth 15: -0.018033
in channel 0 height 0 widdth 16: -0.017721
in channel 0 height 0 widdth 17: -0.015129
in channel 0 height 0 widdth 18: -0.019686
in channel 0 height 0 widdth 19: -0.028392
in channel 0 height 0 widdth 20: -0.024050
in channel 0 height 0 widdth 21: -0.022321
in channel 0 height 0 widdth 22: -0.020272
in channel 0 height 0 widdth 23: 0.009465
in channel 0 height 0 widdth 24: -0.002520
in channel 0 height 0 widdth 25: 0.031292
in channel 0 height 0 widdth 26: 0.067178
in channel 0 height 0 widdth 27: 0.087159
in channel 0 height 0 widdth 28: 0.029946
in channel 0 height 0 widdth 29: -0.017973
in channel 0 height 0 widdth 30: -0.017940
in channel 0 height 0 widdth 31: -0.019776
in channel 0 height 0 widdth 32: -0.002067
in channel 0 height 0 widdth 33: -0.020514
in channel 0 height 0 widdth 34: 0.041660
in channel 0 height 0 widdth 35: 0.006402
in channel 0 height 0 widdth 36: -0.029906
in channel 0 height 0 widdth 37: -0.065108
in channel 0 height 0 widdth 38: -0.466568
in channel 0 height 0 widdth 39: -1.015566
in channel 0 height 0 widdth 40: -1.679609
in channel 0 height 0 widdth 41: -1.147297
in channel 0 height 0 widdth 42: 0.296169
in channel 0 height 0 widdth 43: -0.039417
in channel 0 height 0 widdth 44: -1.135922
in channel 0 height 0 widdth 45: -0.274635
in channel 0 height 0 widdth 46: -0.091100
in channel 0 height 0 widdth 47: 0.652586
in channel 0 height 0 widdth 48: 0.145018
in channel 0 height 0 widdth 49: 1.092467
in channel 0 height 0 widdth 50: -0.911300
in channel 0 height 0 widdth 51: -0.347949
in channel 0 height 0 widdth 52: -0.388288
in channel 0 height 0 widdth 53: -1.443141
in channel 0 height 0 widdth 54: -1.297417
in channel 0 height 0 widdth 55: -0.699177
in channel 0 height 0 widdth 56: -0.693910
in channel 0 height 0 widdth 57: -0.975538
in channel 0 height 0 widdth 58: -0.175983
in channel 0 height 0 widdth 59: -0.064323
in channel 0 height 0 widdth 60: 0.010904
in channel 0 height 0 widdth 61: 0.015279
in channel 0 height 0 widdth 62: 0.072926
in channel 0 height 0 widdth 63: 0.066368
in channel 0 height 0 widdth 64: 0.014443
in channel 0 height 0 widdth 65: 0.016777
in channel 0 height 0 widdth 66: -0.011944
in channel 0 height 0 widdth 67: -0.021158
in channel 0 height 0 widdth 68: 0.008240
in channel 0 height 0 widdth 69: 0.023215
in channel 0 height 0 widdth 70: 0.024927
in channel 0 height 0 widdth 71: 0.024651
in channel 0 height 0 widdth 72: 0.016828
in channel 0 height 0 widdth 73: 0.015675
in channel 0 height 0 widdth 74: -0.012813
in channel 0 height 0 widdth 75: -0.030890
in channel 0 height 0 widdth 76: 0.017446
in channel 0 height 0 widdth 77: 0.021230
in channel 0 height 0 widdth 78: 0.027239
in channel 0 height 0 widdth 79: 0.020941
in channel 0 height 0 widdth 80: 0.022225
in channel 0 height 0 widdth 81: 0.021666
in channel 0 height 0 widdth 82: 0.022225
in channel 0 height 0 widdth 83: 0.022258
in channel 0 height 0 widdth 84: 0.020248 
in channel 0 height 0 widdth 85: 0.015608
in channel 0 height 0 widdth 86: 0.021186
in channel 0 height 0 widdth 87: 0.030264
in channel 0 height 0 widdth 88: 0.017256
in channel 0 height 0 widdth 89: 0.014299
in channel 0 height 0 widdth 90: 0.018448
in channel 0 height 0 widdth 91: 0.017410
in channel 0 height 0 widdth 92: 0.018768
in channel 0 height 0 widdth 93: 0.018270
in channel 0 height 0 widdth 94: 0.017908
in channel 0 height 0 widdth 95: -0.335838

As you can see, their output are close but not same. I also check the input data to them are the same, so this is so weird ! SAME WEIGHT, SAME INPUT, BUT DIFFERENT OUPUT ! I also find a weird phenomena :we know the output is 16 96×96 featuremaps, but the output from ncnn model, all 16 channels of feature maps are all the same ! That is to say, the 2nd feature map and so on are the duplicate of 1st feature map. As shown in following image, you can see at every channel ,the value -5.346490 appears once: image

nihui commented 2 months ago

针对onnx模型转换的各种问题,推荐使用最新的pnnx工具转换到ncnn In view of various problems in onnx model conversion, it is recommended to use the latest pnnx tool to convert your model to ncnn

pip install pnnx
pnnx model.onnx inputshape=[1,3,224,224]

详细参考文档 Detailed reference documentation https://github.com/pnnx/pnnx https://github.com/Tencent/ncnn/wiki/use-ncnn-with-pytorch-or-onnx#how-to-use-pnnx