HimaxWiseEyePlus / himax_tflm

Apache License 2.0
24 stars 15 forks source link

Weird tensorflow lite message #18

Closed PCPJ closed 1 year ago

PCPJ commented 2 years ago

Hello everyone. I have a himax board and I am trying to deploy my own classification model. I am having some problems, hope someone can help me. My model is a simpler version of a mobilenet_v2 for classification. When invoking the model for inference, I am receiving a weird message from tensorflow lite. Does anyone knows what it means?

tensorflow/lite/micro/kernels/arc_mli/depthwise_conv.cc:574 in_slice.Done() was not true.

I am receiving the models output after invoking the model, but don't know yet if the outputs are correct. I still have to use the himax camera at my domain environment. I am just a little worried with this message, any help is welcome. Thank you so much for the attention.

kris-himax commented 2 years ago

@PCPJ, hi We used to face the problem at deploying another model on conv.cc. We try to figure out the problem and finally find out that the problem is because the input tensor size at conv node will cause the max_out_lines_for_input value at function (arc_scratch_buffer_calc_slice_size_io at scratch_buf_mgr.cc) not be divisible. Hence, we will face "in_slice.Done() was not true" warning. In our case, we change code at "tensorflow/lite/micro/kernels/arc_mli/scratch_buf_mgr.cc line 323 and 324" to following code.

      // max_out_lines_for_input =
      //     (max_lines_in - kernel_height + 1) / stride_height;
      max_out_lines_for_input = (max_lines_in - kernel_height + 1)%stride_height !=0?
          ((max_lines_in - kernel_height + 1) / stride_height)+1:(max_lines_in - kernel_height + 1) / stride_height;

And the warning is gone, and the answer is correct.

Thanks, Kris

PCPJ commented 1 year ago

Hello @kris-himax. Sorry for the late reply. I tried to modify the lines as you suggested, but I am still receiving the same messages. Also, I am testing the model in another way now, and the model isn't outputting the same compared to running using python tflite in my local machine. Do you think is has something to do with these messages?

I stored an rgb image into a static variable inside the code I am flashing into the board. Instead of sending images from the camera to the model, I am sending always this static image. I expected to receive the same output compared with running the same model, with same input, in my local machine. Instead I am receiving -127 as output. Please allow me to show how I am doing it.

PCPJ commented 1 year ago
void loop() {
  hx_drv_uart_print("Preparing input!\n");
  if (input->type == TfLiteType::kTfLiteInt8) {
    void* quanti_param = input->quantization.params;
    assert(input->quantization.type == TfLiteQuantizationType::kTfLiteAffineQuantization);
    TfLiteAffineQuantization* quanti_affine_param = static_cast<TfLiteAffineQuantization*>(quanti_param);
    for (uint i = 0; i < (kNumCols*kNumRows); i+=3) {
      float r = ((float) my_img[i+0]);
      float g = ((float) my_img[i+1]);
      float b = ((float) my_img[i+2]);
      // ImageNet Normalization
      r = ((r)/255.0f -0.406f) / 0.225f;
      g = ((g)/255.0f -0.456f) / 0.224f;
      b = ((b)/255.0f -0.485f) / 0.229f;
      // Quantization normalization
      r = (r / quanti_affine_param->scale->data[0]) + quanti_affine_param->zero_point->data[0];
      g = (g / quanti_affine_param->scale->data[1]) + quanti_affine_param->zero_point->data[1];
      b = (b / quanti_affine_param->scale->data[2]) + quanti_affine_param->zero_point->data[2];
      input->data.int8[i+0] = (int8_t)b;
      input->data.int8[i+1] = (int8_t)g;
      input->data.int8[i+2] = (int8_t)r;
    }
  } else {
    hx_drv_uart_print("Input type isn't int [%d]!\n", input->type);
    assert(false);
  }

  hx_drv_uart_print("Invoking model!\n");
  // Run the model on this input and make sure it succeeds.
  if (kTfLiteOk != interpreter->Invoke()) {
    TF_LITE_REPORT_ERROR(error_reporter, "Invoke failed.");
  }

  hx_drv_uart_print("Getting output!\n");
  TfLiteTensor* output = interpreter->output(0);

  // Process the inference results.
  int8_t class1_score = output->data.int8[0];
  int8_t class2_score = output->data.int8[1];
  if (input->type == TfLiteType::kTfLiteInt8) {
    void* quanti_param = input->quantization.params;
    assert(input->quantization.type == TfLiteQuantizationType::kTfLiteAffineQuantization);
    TfLiteAffineQuantization* quanti_affine_param = static_cast<TfLiteAffineQuantization*>(quanti_param);
    float float_class1_score = ((float)class1_score - quanti_affine_param->zero_point->data[0] ) * quanti_affine_param->scale->data[0];
    float float_class2_score = ((float)class2_score - quanti_affine_param->zero_point->data[0] ) * quanti_affine_param->scale->data[0];
    hx_drv_uart_print("Class1 [%g] - Class2 [%g]\n", (double)float_class1_score, (double)float_class2_score);
    hx_drv_uart_print("Class1 [%d] - Class2 [%d]\n", class1_score, class2_score);
  }
}

The variable my_img has the raw data of the input RGB image.

kris-himax commented 1 year ago

Hi @PCPJ , In my opinion, I think that your "Preparing input" is wrong. I saw that your model input type is kTfLiteInt8 at your code which means that you have already done quantization while you converting your model. Did you do ImageNet Normalization while you do quantization to INT8 model? If you have done ImageNet Normalization to calibrate your data, you don't need to do it again. You only need to convert your input image from uint8 to int8, and do not need to do dequantization.

 TfLiteAffineQuantization* quanti_affine_param = static_cast<TfLiteAffineQuantization*>(quanti_param);
    for (uint i = 0; i < (kNumCols*kNumRows); i+=3) {
      float r = ((float) my_img[i+0]);
      float g = ((float) my_img[i+1]);
      float b = ((float) my_img[i+2]);
      // ImageNet Normalization
      r = ((r)/255.0f -0.406f) / 0.225f;
      g = ((g)/255.0f -0.456f) / 0.224f;
      b = ((b)/255.0f -0.485f) / 0.229f;
      // Quantization normalization
      r = (r / quanti_affine_param->scale->data[0]) + quanti_affine_param->zero_point->data[0];
      g = (g / quanti_affine_param->scale->data[1]) + quanti_affine_param->zero_point->data[1];
      b = (b / quanti_affine_param->scale->data[2]) + quanti_affine_param->zero_point->data[2];
      input->data.int8[i+0] = (int8_t)b;
      input->data.int8[i+1] = (int8_t)g;
      input->data.int8[i+2] = (int8_t)r;

And our Himax HM0360 camera can only get grayscale images. Maybe your model should change the input channel size.

Thanks, Kris

PCPJ commented 1 year ago

Hi @kris-himax. Thank you so much for the reply. I trained my model with images normalized with the ImageNet numbers. I performed the quantization this way:

converter = tf.compat.v1.lite.TFLiteConverter.from_saved_model(out_tf_path)

def representative_dataset():
    dataset = MyDataset()
    for image, target in dataset:
        image = image.numpy()
        yield ([image]) # THIS IMAGE IS NORMALIZED WITH IMAGENET NUMBERS

# Set the optimization flag.
converter.optimizations = [tf.lite.Optimize.DEFAULT]

# Enforce integer only quantization
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type = tf.int8
converter.inference_output_type = tf.int8

# Provide a representative dataset to ensure we quantize correctly.
converter.representative_dataset = representative_dataset
converter.allow_custom_ops = False

print("Converting to TF Lite quantized model.")
tflite_model = converter.convert()

I understand that you said I don't need to perform the ImageNet normalization again, but I didn't understand what you mean with: You only need to convert your input image from uint8 to int8, and do not need to do dequantization. I only need to subtract 128 from the input pixels? Like this:

    TfLiteAffineQuantization* quanti_affine_param = static_cast<TfLiteAffineQuantization*>(quanti_param);
    for (uint i = 0; i < (kNumCols*kNumRows); i+=3) {
      int r =  ((int) my_img[i+0]) - 128;
      int g = ((int) my_img[i+1]) - 128;
      int b = ((int) my_img[i+2]) - 128;
      input->data.int8[i+0] = (int8_t)b;
      input->data.int8[i+1] = (int8_t)g;
      input->data.int8[i+2] = (int8_t)r;

Also, I am using this function to retrieve the image from the camera:

TfLiteStatus GetImage(tflite::ErrorReporter* error_reporter, int image_width,
                      int image_height, int channels, int8_t* image_data) {
  static bool is_initialized = false;
  if (!is_initialized) {
    if (hx_drv_sensor_initial(&g_pimg_config) != HX_DRV_LIB_PASS) {
      return kTfLiteError;
    }
    is_initialized = true;
  }
  hx_drv_sensor_capture(&g_pimg_config);
  hx_drv_image_rescale((uint8_t*)g_pimg_config.raw_address,
                       g_pimg_config.img_width, g_pimg_config.img_height,
                       image_data, image_width, image_height);
  return kTfLiteOk;
}

My image data vector has size height*width*3channels. Is this function filling the first third of the vector, and letting the rest untouched? Any way, I will have to retrain the model to receive grayscale images.

Thank you so much again.

kris-himax commented 1 year ago

Hi @PCPJ , I understand that you said I don't need to perform the ImageNet normalization again, but I didn't understand what you mean with: You only need to convert your input image from uint8 to int8, and do not need to do dequantization. I only need to subtract 128 from the input pixels? Like this:

    TfLiteAffineQuantization* quanti_affine_param = static_cast<TfLiteAffineQuantization*>(quanti_param);
    for (uint i = 0; i < (kNumCols*kNumRows); i+=3) {
      int r =  ((int) my_img[i+0]) - 128;
      int g = ((int) my_img[i+1]) - 128;
      int b = ((int) my_img[i+2]) - 128;
      input->data.int8[i+0] = (int8_t)b;
      input->data.int8[i+1] = (int8_t)g;
      input->data.int8[i+2] = (int8_t)r;

Ans: Yes, that is what I mean.

The GetImage function was the template function at this google person detection example (https://github.com/tensorflow/tflite-micro/blob/a37043ff7a42e9d46345b4a6b33abda2fd1081f5/tensorflow/lite/micro/examples/person_detection/image_provider.cc) from tensorflow/tflite-micro repository , and we modified it to let it can run on our camera sensor. The channels parameter at the GetImage function is unused because our sensor can only receive grayscale images. If you use the GetImage function, the image_data pointer will be the output which is receive grayscale images from HIMAX HM0360 camera and had already been converted to be height*width*1channels INT8 type.

Thanks, Kris

chinya07 commented 1 year ago

Hi @kris-himax & @PCPJ, I have trained a classification model on 96X96X1 grayscale images and converted it to a .tflite model. The .tflite model size is 706 Kb. The max Tensor Arena Size I found by trial and error is- constexpr int kTensorArenaSize = 204 * 1024;

But when I am trying to flash the .img to himax board I am getting this error-

Arena size is too small for all buffers. Needed 212064 but only 191120 was avai. AllocateTensors() failed
MicroAllocator: Model allocation started before finishing previously allocated l Failed starting model allocation.

Invoke failed.

And if I try to increase the Tensor Arena Size it is not able to build .elf model

Do you have any idea what's the issue here? Let me know if you need any additional info. Thanks!

PCPJ commented 1 year ago

Hello @chinya07. Which architecture are you using? The cropped mobilenet I am using have only 60 KB. When I convert the .tflite file to .cc, the size becomes 364 KB. How big is your .ccmodel file? I think it is bigger than 1.5 MB, right? Try to use an architecture that ends up with less than 1.5 MB in size, because the person detection example model has 1.5 MB in size.

PCPJ commented 1 year ago

Hello @kris-himax. Thank you so much for the help, and sorry for the late reply. I tried to feed the image the way you suggested and it seems to be working. The output isn't -128 anymore. I still have to compare the output in the himax with the output using the .tflite file in my notebook, using the TFLite in python. Also, I still have to retrain the model to use gay scale images. Can I test these stuffs and later close this issue?

chinya07 commented 1 year ago

Hi @PCPJ, my .cc model size is 706200 and my cropped mobilenet architecture has 660,578 trainable parameters in total.

kris-himax commented 1 year ago

@chinya07 hi,

And if I try to increase the Tensor Arena Size it is not able to build .elf model

What is the error message about building .elf file? I think that maybe your tflite model is too big to put at SRAM or you can try to put .tflite file at flash. Reference the flash api and image gen tool command https://github.com/HimaxWiseEyePlus/bsp_tflu/tree/master/HIMAX_WE1_EVB_SDK#flash-api

chinya07 commented 1 year ago

Hi @kris-himax, sorry for the late reply. Yes that is correct my model size was too big and it got even bigger while building the image. I cropped few more layers and made my model smaller and it worked! Thanks for your support and I will look into the flash-api as well.