espressif / esp-nn

Optimised Neural Network functions for Espressif chipsets
Apache License 2.0
125 stars 22 forks source link

Different result for convolutions when optimized #5

Closed argelius closed 11 months ago

argelius commented 1 year ago

Hi!

I've noticed an issue where if we use 2d convolution layers in our neural network the output is different (and wrong) when we use the optimized functions but correct when using unoptimized convolutions. This seems to be true for all networks with convolution layers that we've tried.

vikramdattu commented 1 year ago

Hello @argelius thanks for the heads up! Can you please share small input sample which leads to the mismatch. Thanks in advance.

vikramdattu commented 1 year ago

@argelius may I also ask information on:

  1. esp-nn top commit
  2. ESP-IDF version
  3. Some details on the model used
argelius commented 1 year ago

@vikramdattu

  1. d374e116ec277724d80cb1007efaf3fd67d28b51
  2. ESP-IDF 4.4
  3. It's a segmentation model based on

    https://idiotdeveloper.com/unet-segmentation-with-pretrained-mobilenetv2-as-encoder/

    This is the definition using Keras. Our implementation uses 128x128 images:

    inputs = Input(shape=(IMAGE_SIZE, IMAGE_SIZE, 3), name="input_image")
    
    encoder = MobileNetV2(input_tensor=inputs, weights="imagenet", include_top=False, alpha=0.35)
    skip_connection_names = ["input_image", "block_1_expand_relu", "block_3_expand_relu", "block_6_expand_relu"]
    encoder_output = encoder.get_layer("block_13_expand_relu").output
    
    f = [16, 32, 48, 64]
    x = encoder_output
    for i in range(1, len(skip_connection_names)+1, 1):
        x_skip = encoder.get_layer(skip_connection_names[-i]).output
        x = UpSampling2D((2, 2))(x)
        x = Concatenate()([x, x_skip])
    
        x = Conv2D(f[-i], (3, 3), padding="same")(x)
        x = BatchNormalization()(x)
        x = Activation("relu")(x)
    
        x = Conv2D(f[-i], (3, 3), padding="same")(x)
        x = BatchNormalization()(x)
        x = Activation("relu")(x)
    
    x = Conv2D(1, (1, 1), padding="same")(x)
    x = Activation("sigmoid")(x)
    
    model = Model(inputs, x)

    Without optimizations it works correctly and with optimization it works only when we disable convolution.

    We also run a simple CNN for classification with the same problem.

We also tried the latest commit and it's showing the same issue.

I will try to extract a small sample that gives incorrect result.

vikramdattu commented 1 year ago

Thanks @argelius thanks for the additional inputs. Meanwhile if you could share the small model which could reproduce the issue, it will be of great help.

argelius commented 1 year ago

@vikramdattu

/* Copyright 2020 The TensorFlow Authors. All Rights Reserved.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
==============================================================================*/

#include "tensorflow/lite/micro/micro_mutable_op_resolver.h"
#include "tensorflow/lite/micro/micro_interpreter.h"
#include "tensorflow/lite/micro/system_setup.h"
#include "tensorflow/lite/schema/schema_generated.h"

#include "main_functions.h"
#include "model.h"
#include "constants.h"
#include "output_handler.h"

// Globals, used for compatibility with Arduino-style sketches.
namespace {
const tflite::Model* model = nullptr;
tflite::MicroInterpreter* interpreter = nullptr;
TfLiteTensor* input = nullptr;
TfLiteTensor* output = nullptr;
int inference_count = 0;

constexpr int kTensorArenaSize = 200000;
uint8_t tensor_arena[kTensorArenaSize];
}  // namespace

float image[] = {0.156863, 0.164706, 0.180392, 0.184314, 0.172549, 0.164706, 0.164706, 0.176471, 0.184314, 0.172549, 0.160784, 0.149020, 0.164706, 0.152941, 0.156863, 0.137255, 0.141176, 0.152941, 0.160784, 0.156863, 0.149020, 0.152941, 0.160784, 0.156863, 0.423529, 0.211765, 0.180392, 0.200000, 0.203922, 0.325490, 0.478431, 0.419608, 0.227451, 0.149020, 0.168627, 0.294118, 0.400000, 0.431373, 0.447059, 0.431373, 0.415686, 0.443137, 0.450980, 0.443137, 0.419608, 0.458824, 0.470588, 0.305882, 0.435294, 0.278431, 0.172549, 0.231373, 0.207843, 0.462745, 0.572549, 0.482353, 0.160784, 0.184314, 0.180392, 0.184314, 0.180392, 0.164706, 0.168627, 0.172549, 0.184314, 0.188235, 0.239216, 0.541176, 0.431373, 0.415686, 0.415686, 0.258824, 0.494118, 0.380392, 0.298039, 0.321569, 0.301961, 0.533333, 0.670588, 0.580392, 0.282353, 0.325490, 0.321569, 0.333333, 0.333333, 0.313725, 0.309804, 0.352941, 0.337255, 0.317647, 0.427451, 0.549020, 0.470588, 0.447059, 0.458824, 0.349020, 0.572549, 0.482353, 0.474510, 0.494118, 0.458824, 0.572549, 0.741176, 0.705882, 0.721569, 0.756863, 0.756863, 0.764706, 0.745098, 0.741176, 0.725490, 0.705882, 0.717647, 0.698039, 0.690196, 0.666667, 0.627451, 0.596078, 0.588235, 0.556863, 0.560784, 0.396078, 0.317647, 0.337255, 0.337255, 0.525490, 0.741176, 0.694118, 0.694118, 0.709804, 0.690196, 0.709804, 0.698039, 0.686275, 0.686275, 0.682353, 0.682353, 0.678431, 0.694118, 0.690196, 0.682353, 0.678431, 0.650980, 0.658824, 0.600000, 0.364706, 0.290196, 0.333333, 0.345098, 0.509804, 0.752941, 0.705882, 0.713725, 0.709804, 0.709804, 0.694118, 0.690196, 0.713725, 0.694118, 0.698039, 0.705882, 0.705882, 0.705882, 0.701961, 0.701961, 0.690196, 0.686275, 0.674510, 0.647059, 0.431373, 0.325490, 0.364706, 0.352941, 0.529412, 0.788235, 0.729412, 0.729412, 0.733333, 0.698039, 0.478431, 0.305882, 0.325490, 0.443137, 0.592157, 0.764706, 0.760784, 0.709804, 0.698039, 0.670588, 0.658824, 0.631373, 0.627451, 0.639216, 0.490196, 0.349020, 0.372549, 0.380392, 0.517647, 0.807843, 0.752941, 0.701961, 0.541176, 0.388235, 0.317647, 0.329412, 0.301961, 0.313725, 0.345098, 0.380392, 0.478431, 0.709804, 0.686275, 0.627451, 0.603922, 0.576471, 0.533333, 0.654902, 0.572549, 0.360784, 0.360784, 0.341176, 0.501961, 0.788235, 0.713725, 0.643137, 0.521569, 0.458824, 0.549020, 0.643137, 0.662745, 0.670588, 0.517647, 0.388235, 0.384314, 0.556863, 0.682353, 0.631373, 0.603922, 0.596078, 0.517647, 0.654902, 0.560784, 0.388235, 0.372549, 0.364706, 0.509804, 0.784314, 0.780392, 0.717647, 0.745098, 0.752941, 0.749020, 0.776471, 0.768627, 0.745098, 0.643137, 0.305882, 0.345098, 0.513725, 0.678431, 0.627451, 0.611765, 0.584314, 0.545098, 0.674510, 0.576471, 0.400000, 0.380392, 0.368627, 0.501961, 0.749020, 0.796078, 0.796078, 0.792157, 0.788235, 0.784314, 0.768627, 0.694118, 0.525490, 0.364706, 0.247059, 0.290196, 0.556863, 0.717647, 0.639216, 0.607843, 0.596078, 0.545098, 0.717647, 0.580392, 0.423529, 0.384314, 0.349020, 0.474510, 0.729412, 0.819608, 0.800000, 0.800000, 0.815686, 0.788235, 0.776471, 0.662745, 0.403922, 0.278431, 0.262745, 0.290196, 0.592157, 0.749020, 0.725490, 0.713725, 0.721569, 0.733333, 0.760784, 0.568627, 0.411765, 0.352941, 0.349020, 0.431373, 0.682353, 0.831373, 0.792157, 0.803922, 0.811765, 0.811765, 0.784314, 0.784314, 0.788235, 0.588235, 0.207843, 0.203922, 0.341176, 0.752941, 0.741176, 0.737255, 0.737255, 0.756863, 0.690196, 0.588235, 0.439216, 0.372549, 0.356863, 0.400000, 0.678431, 0.850980, 0.803922, 0.807843, 0.807843, 0.800000, 0.792157, 0.780392, 0.772549, 0.709804, 0.384314, 0.160784, 0.286275, 0.721569, 0.670588, 0.639216, 0.619608, 0.611765, 0.647059, 0.611765, 0.509804, 0.443137, 0.396078, 0.454902, 0.690196, 0.847059, 0.823529, 0.823529, 0.827451, 0.827451, 0.800000, 0.780392, 0.788235, 0.596078, 0.298039, 0.203922, 0.443137, 0.725490, 0.615686, 0.560784, 0.552941, 0.509804, 0.701961, 0.662745, 0.560784, 0.450980, 0.403922, 0.447059, 0.647059, 0.815686, 0.780392, 0.792157, 0.792157, 0.792157, 0.815686, 0.784314, 0.549020, 0.313725, 0.337255, 0.407843, 0.658824, 0.729412, 0.670588, 0.600000, 0.592157, 0.556863, 0.674510, 0.654902, 0.521569, 0.368627, 0.368627, 0.384314, 0.584314, 0.749020, 0.682353, 0.478431, 0.447059, 0.443137, 0.419608, 0.376471, 0.368627, 0.376471, 0.450980, 0.643137, 0.745098, 0.701961, 0.678431, 0.623529, 0.592157, 0.549020, 0.690196, 0.698039, 0.494118, 0.290196, 0.282353, 0.294118, 0.521569, 0.737255, 0.615686, 0.360784, 0.301961, 0.313725, 0.309804, 0.352941, 0.439216, 0.592157, 0.713725, 0.737255, 0.721569, 0.709804, 0.674510, 0.658824, 0.650980, 0.611765, 0.764706, 0.733333, 0.474510, 0.239216, 0.247059, 0.286275, 0.450980, 0.686275, 0.674510, 0.690196, 0.698039, 0.701961, 0.717647, 0.721569, 0.705882, 0.686275, 0.709804, 0.729412, 0.721569, 0.698039, 0.717647, 0.705882, 0.717647, 0.749020, 0.705882, 0.654902, 0.478431, 0.262745, 0.247059, 0.278431, 0.396078, 0.658824, 0.678431, 0.674510, 0.670588, 0.674510, 0.678431, 0.678431, 0.674510, 0.678431, 0.690196, 0.686275, 0.690196, 0.670588, 0.678431, 0.666667, 0.650980, 0.650980, 0.600000, 0.584314, 0.462745, 0.298039, 0.231373, 0.235294, 0.356863, 0.584314, 0.635294, 0.631373, 0.635294, 0.631373, 0.623529, 0.619608, 0.623529, 0.615686, 0.615686, 0.619608, 0.615686, 0.627451, 0.592157, 0.568627, 0.560784, 0.533333, 0.384314, 0.388235, 0.376471, 0.282353, 0.219608, 0.227451, 0.270588, 0.431373, 0.486275, 0.482353, 0.486275, 0.482353, 0.454902, 0.415686, 0.349020, 0.278431, 0.247059, 0.313725, 0.388235, 0.415686, 0.392157, 0.305882, 0.305882, 0.258824, 0.180392, 0.200000, 0.188235, 0.188235, 0.184314, 0.184314, 0.172549, 0.184314, 0.192157, 0.172549, 0.180392, 0.180392, 0.164706, 0.164706, 0.160784, 0.145098, 0.164706, 0.156863, 0.180392, 0.176471, 0.180392, 0.196078, 0.192157, 0.215686};

// The name of this function is important for Arduino compatibility.
void setup() {
  // Map the model into a usable data structure. This doesn't involve any
  // copying or parsing, it's a very lightweight operation.
  model = tflite::GetModel(g_model);
  if (model->version() != TFLITE_SCHEMA_VERSION) {
    MicroPrintf("Model provided is schema version %d not equal to supported "
                "version %d.", model->version(), TFLITE_SCHEMA_VERSION);
    return;
  }

  // Print something nice
  MicroPrintf("Hello TensorFlow Lite for Microcontrollers");

  // Pull in only the operation implementations we need.
  static tflite::MicroMutableOpResolver<11> resolver;
  if (resolver.AddFullyConnected() != kTfLiteOk) {
    return;
  }

  if (resolver.AddConv2D() != kTfLiteOk) {
    return;
  }

  if (resolver.AddMaxPool2D() != kTfLiteOk) {
    return;
  }

  if (resolver.AddConcatenation() != kTfLiteOk) {
    return;
  }

  if (resolver.AddDepthwiseConv2D() != kTfLiteOk) {
    return;
  }

  if (resolver.AddPad() != kTfLiteOk) {
    return;
  }

  if (resolver.AddAdd() != kTfLiteOk) {
    return;
  }

  if (resolver.AddResizeNearestNeighbor() != kTfLiteOk) {
    return;
  }

  if (resolver.AddLogistic() != kTfLiteOk) {
    return;
  }

  if (resolver.AddReshape() != kTfLiteOk) {
    return;
  }

  if (resolver.AddSoftmax() != kTfLiteOk) {
    return;
  }

  // Build an interpreter to run the model with.
  static tflite::MicroInterpreter static_interpreter(
      model, resolver, tensor_arena, kTensorArenaSize);
  interpreter = &static_interpreter;

  // Allocate memory from the tensor_arena for the model's tensors.
  TfLiteStatus allocate_status = interpreter->AllocateTensors();
  if (allocate_status != kTfLiteOk) {
    MicroPrintf("AllocateTensors() failed");
    return;
  }

  // Obtain pointers to the model's input and output tensors.
  input = interpreter->input(0);
  output = interpreter->output(0);

  // Keep track of how many inferences we have performed.
  inference_count = 0;
}

// The name of this function is important for Arduino compatibility.
void loop() {
  int i;

  // Add image to input tensor
  for (i = 0; i < input->bytes; ++i) {
    input->data.int8[i] = static_cast<int8_t>(image[i] / input->params.scale + input->params.zero_point);
  }

  // Run inference, and report any error
  TfLiteStatus invoke_status = interpreter->Invoke();
  if (invoke_status != kTfLiteOk) {
    MicroPrintf("Invoke failed!");
    return;
  }

  for (int i = 0; i < output->bytes; ++i) {
    MicroPrintf("Output: %f\n",
                         static_cast<double>(output->data.int8[i] - output->params.zero_point) * output->params.scale);
  }

  while (1) {
    /** Do nothing 
     * This is to prevent the program from exiting
    */
  }
}

I've modified the hello_world-example. The float array is an example input image and when I run without optimizations this will give a good output but the output is strange when using optimization:

Without opt:

Output: 1.0*2^-7
Output: 1.0*2^-127
Output: 1.0937498*2^-3
Output: 1.9062496*2^-3
Output: 1.0*2^-127
Output: 1.1562498*2^-2
Output: 1.4999999*2^-4
Output: 1.4999999*2^-7
Output: 1.2499999*2^-4
Output: 1.0937498*2^-3
Output: 1.0*2^-8

With opt:

Output: 1.0*2^-127
Output: 1.0*2^-127
Output: 1.0*2^-8
Output: 1.0*2^-127
Output: 1.0*2^-127
Output: 1.0*2^-127
Output: 1.0*2^-127
Output: 1.0*2^-127
Output: 1.0*2^-127
Output: 1.0*2^-127
Output: 1.9921868*2^-1

model.cc: https://pastebin.com/Pc9EPr47

vikramdattu commented 1 year ago

@argelius thanks again for the help! I am able to re-produce the issue with your model and working on the same. Will update you here once fix is available. Thanks.

vikramdattu commented 1 year ago

@argelius

Hello again! :) Thanks for the patience.

Please find the attached patch: fix_mismatch.patch

I have tested with my setup and this is fixing the issue. Please do check from your side as well. I shall release the fix to the repo once confirmed!

Thanks, Vikram

argelius commented 1 year ago

@vikramdattu We've verified the patch now and it works correctly! :tada:

Thank you so much for your assistance! :heart:

Both our models work correctly now and when running optimized it performs around 5x faster!

vikramdattu commented 1 year ago

@argelius that's great! Will push the fix to the upstream and close the issue! 👍