Output discrepancy between python tflite and espidf tflite micro outputs (TFMIC-29)

I have been trying to run a int8 quantized converted custom keras model on my esp32 cam device but I have been getting discrepancies in outputs. I have so far narrowed it down to a problem with the mul operation as all previous layers match the python implementation. Here is the segment of the model i am testing with:

Screenshot 2024-06-27 173920

I have thought that it is possibly a broadcasting issue, so I have also tried using tf.reshape(), tf.keras.layers.Reshape, and tf.broadcast_to but none have worked.

I have tried converting the keras model using each of the three methods: from_keras_model, from_saved_model and from_concrete_function but still no:

## From keras model
converter = tf.lite.TFLiteConverter.from_keras_model(chunk[1])

## From saved model
# chunk[1].save(path + chunk[0] + ".keras")
# converter = tf.lite.TFLiteConverter.from_saved_model(path + chunk[0])

## From concrete function
# model_func = tf.function(func=chunk[1])
# inputSpecs = [tf.TensorSpec(shape=(1,) + x.shape[1:], dtype=tf.float32) for x in chunk[1].inputs]
# cf = model_func.get_concrete_function(inputSpecs)
# converter = tf.lite.TFLiteConverter.from_concrete_functions([cf], chunk[1])

converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8, tf.float16, tf.int32]
converter.inference_input_type = tf.int8
converter.inference_output_type = tf.int8
converter.representative_dataset = chunk[2]

try:
    # Convert the model
    tflite_model = converter.convert()

    # Save the TFLite model to a file
    tflite_file_path = path + chunk[0] + ".tflite"
    with open(tflite_file_path, "wb") as file:
        file.write(tflite_model)

    print(f"{chunk[0]} conversion successful!")
except Exception as e:
    print(f"Error converting {chunk[0]}: {e}")

Thought it could have also been a problem with ESP_NN but have tried the unoptimized setting to no avail.

My next thinking would be that it is not handling the two int8 multiplication overflow into int32 but I dont know where to begin testing that.

Here is an example of what a few output layers of the netron output above looks like: with left side being the esp32 and the right being the python implementation output of the same image, which is just an image of my hand taken by the esp32cam. Figure_1

I am running espidf 5.2.2 and tensorflow 2.14.1, but have also tried 2.15.0 and 2.16.1 for tensorflow and 5.3, 5.1.2 for espidf with the same result.

Your help would be much appreciated!

Thanks for the detailed inputs on the issue @farari107 and confirming that it is not specific to esp-nn.

Can you please also check the quantised model output with python? Does it match the ESP32's output?
Please disable specifically the call below in tensorflow/lite/micro/kernels/esp_nn/mul.cc and check if it fixes the output?
```
case kTfLiteInt8:
#if ESP_NN
  MulEvalQuantized(context, node, data, input1, input2, output);
#else
  EvalMulQuantizedReference(context, node, data, input1, input2, output);
#endif
```
to run EvalMulQuantizedReference(context, node, data, input1, input2, output);

Further, if this still doesn't help to narrow down the issue, is it possible to provide a small example with which I can reproduce the issue?

Thanks for the fast reply @vikramdattu.

So the output image that I gave was the python quantized tflite interpreter output of the same input image and the same model, which closely follows the accuracy of the original model when trained, so everything works perfectly using the python interpreter just not the esp32.

I tried your advice to use only: EvalMulQuantizedReference(context, node, data, input1, input2, output); but it did not change much as this is still the output of the esp32 vs the output for the python tflite interpreter: Figure_1

Which I tested again using different reshape methods with similar results.

I can give you an example of the tensorflow python code but the espidf will be tricky, but I modeled it somewhat off the person detection example with just my model instead.

Here is an example of the python model:

def trialModel():
    height = 120
    width = 160
    channels = 1
    #
    L1 = 1e-5
    L2 = 1e-3
    #
    filt_base = 9
    activation = tf.nn.relu6
    #
    inp = tf.keras.Input(shape=(height, width, channels), name="model_input")

    x = tf.keras.layers.Conv2D(filt_base, (3, 3), (2, 2), padding='same', kernel_regularizer=keras.regularizers.L1L2(l1=L1, l2=L2))(inp)
    x = tf.keras.layers.BatchNormalization()(x)
    x = tf.keras.layers.Activation(activation)(x)

    #### Input dim: 60, 80
    x = tf.keras.layers.Conv2D(filt_base * 2, (1, 1), (1, 1), padding='same', kernel_regularizer=keras.regularizers.L1L2(l1=L1, l2=L2))(x)
    x = tf.keras.layers.BatchNormalization()(x)
    x = tf.keras.layers.Activation(activation)(x)

    x = tf.keras.layers.DepthwiseConv2D((3, 3), dilation_rate=(1, 1), padding='same',
                                              depthwise_regularizer=keras.regularizers.L1L2(l1=L1, l2=L2))(x)
    x1 = tf.keras.layers.Activation(activation)(x)

    # x = tf.keras.layers.GlobalMaxPooling2D()(x1)
    x = tf.keras.layers.MaxPooling2D((60, 80), (1, 1), padding="valid")(x1)
    x = tf.keras.layers.Flatten()(x)

    x = tf.keras.layers.Dense(int(filt_base * 2 / 3),
                                   kernel_regularizer=keras.regularizers.L1L2(l1=L1, l2=L2))(x)
    x = tf.keras.layers.Activation(activation)(x)

    x = tf.keras.layers.Dense(int(filt_base * 2),
                                   kernel_regularizer=keras.regularizers.L1L2(l1=L1, l2=L2))(x)
    x = tf.keras.layers.Activation("sigmoid")(x)
    # x = tf.broadcast_to(x, tf.shape(x1))
    x = tf.reshape(x,  [-1, 1, 1, int(filt_base * 2)])

    x = tf.keras.layers.Multiply()([x1, x])

    x = tf.keras.layers.Conv2D(filt_base * 2, (1, 1), (1, 1), padding='same',
                                    kernel_regularizer=keras.regularizers.L1L2(l1=L1, l2=L2))(x)
    x = tf.keras.layers.BatchNormalization()(x)
    x = tf.keras.layers.Activation(activation)(x)

    model = tf.keras.Model(inputs=inp, outputs=x)

    return model

model = trialModel()
model.compile(optimizer=tf.keras.optimizers.Adam(0.005))

And then just the same for the converter, where the rep dataset can just be random float32 variables between 0-1 of size stated above in the model.

If you need any further information please do ask.

@vikramdattu sorry for the mistake, I have been testing the code today and have found out that I was actually receiving corrupt data from overloading the mqtt (which was how I was getting the data from the esp32cam) by trying to assess and get data from a few layers at the same time.

I have now corrected this and have been receiving correct outputs from the MUL layer even with MulEvalQuantized turned on, however the problem still persists.

I have found that there are still issues with the code:

The DepthwiseConv2D output on the esp32 is very different to the python implementation when using from_keras_model(), but is corrected when you use tf.lite.TFLiteConverter.from_concrete_function().
For some reason in my model when adding two 4 dimentional tensors I get a large shift where the final image looks the same but all the values have shifted by more than +-50 in an int8 range. I have tested this in isolation but it works fine, so im not sure what the problem is.

I can only assume its because of the small divergence in each progressive layer which when faced with the addition op it scales it improperly. I have seen that each progressive layer (ie conv2d, depthwise, and dense) diverges +-5 a small amount of the total data points of each tensor, so by the time my first ADD op, the two layers have both diverged about 7000 out of ~40000 data points compared to the python quantized tflite implementation.

I will try to replace the Add op with Concatenate tomorrow to see if it solves the shift problem but would prefer the Add op as Concatenate will increase the model size dramatically.

Your advice would be very much appreciated.

@farari107 can you please share following details:

Does the mismatch continue even if you disable esp-nn?
Can you check at Add layer if the inputs are same for both esp32 and python?

I will try to replace the Add op with Concatenate tomorrow to see if it solves the shift problem but would prefer the Add op as Concatenate will increase the model size dramatically.

Please do this for the experiment purpose. If there indeed is a bug in Add OP implementation, it of course needs to be fixed. Again, I will request you to share the observations with esp-nn turned off. This will corner down the issue to the C implementation itself OR to the wrong usage of the OPs.

@vikramdattu,

I Have finally found the source of the problem!

So I have tested on the labeled places on the netron image and have compared the esp32 output to the python tflite interpreter using both esp-nn and the unoptimized versions and they produced very similar results: model_netron

86248 / 86400, The ones that do not match are only max +- 5 difference
42036 / 43200, The ones that do not match are only max +- 5 difference
41677 / 43200, The ones that do not match are only max +- 5 difference
0 / 43200, but if printed out the layers it has the same look as the python interpreter layers, so must just be a shift in values
37099 / 43200, The ones that do not match are only max +- 5 difference
0 / 43200

So when the relu (4) is combined with the depthwise (3), the output is correct. ie:

espressif / esp-tflite-micro

Output discrepancy between python tflite and espidf tflite micro outputs (TFMIC-29) #86