espressif / esp-tflite-micro

TensorFlow Lite Micro for Espressif Chipsets
Apache License 2.0
374 stars 81 forks source link

Inconsistent Model Predictions #67

Closed MATTYGILO closed 11 months ago

MATTYGILO commented 11 months ago

@vikramdattu Thank you helping me earlier, I started modifying the firmware as I wanted to be able to fix an error I am receiving.

Repos in use

https://github.com/espressif/tflite-micro-esp-examples https://github.com/espressif/esp-nn

Micropython firmware based off

https://github.com/mocleiri/tensorflow-micropython-examples

Summarised

I have a model which is made up of simple inverted res net layers with no recurrence or memory. It is binary classification with sigmoid activation. However I am getting inconsistent predictions. For example:

Pseudo Code

x = ...(Doesn't Change)...
while True:
    model.predict(x)

Results in:

0.5960784
0.454902
0.4470588
0.4470588
0.4470588
0.4470588

As you can see the predictions start at a value then change and then stabilise. They are using exactly the same data (Absolutely Certain). I have tried many other tflite models and the same problem occurs.

I have done tests where I have loaded two of the same models at the same time and they both follow the same pattern above. When I reset the device and start the predictions again they follow the exact same pattern.

Temporary Fix (Don't understand why):

Process:

  1. Wipe the header of the arena
  2. Reinitialise the model

Results in consistent prediction

Wiping and reinitialising is too slow. Doesn't fix root cause!

I was wondering whether you had any thoughts on what could be the root cause of the problem?

Gut feeling:

I think that something is going wrong in the header of the arena, where input/Intermediate/output tensors are being mixed up. Feels like some recursion is going on from invocation to invocation where data is being mixed up in the header and that is why it stabilises when it finds data that leads to the same data x-> x

Note:

I have have about 3 pages of in depth trial and error logs that I can provide if you need more info

All help would be very appreciated I’ve been stuck on this for about 5 days.

vikramdattu commented 11 months ago

@MATTYGILO

  1. What happens if you just re-populate the input? I guess the input Tensor gets re-used.
  2. Is it possible for me to try the setup? I didn't looked into the possibility of running https://github.com/mocleiri/tensorflow-micropython-examples on host machine. Is it possible to do so? I am interested to check the result on host platform as well. Also, it will be easier to debug the issue, if it can be re-produced on host.
MATTYGILO commented 11 months ago

@vikramdattu My firmware isn't in a clean enough state as it has many other things implemented, I think that https://github.com/mocleiri/tensorflow-micropython-examples may have the same problem. Is it possible to use their firmware for your testing?

  1. In my testing I am re populating the input tensor each time and it still causes the same bug
  2. Currently using an esp32s3, sadly I don't know how to run on host

If there is anything else I can do to aid you do say

MATTYGILO commented 11 months ago

So I modified my firmware so that it would output these values

// Output whether we are using tflite static, TF_LITE_STATIC_MEMORY
#ifdef TF_LITE_STATIC_MEMORY
mp_printf(MP_PYTHON_PRINTER, "Using TF_LITE_STATIC_MEMORY\n");
#endif

// Output whether we are using esp nn
#ifdef ESP_NN
mp_printf(MP_PYTHON_PRINTER, "Using ESP_NN\n");
#endif

// Output whether the neural network is optimised
#ifdef CONFIG_NN_OPTIMIZED
mp_printf(MP_PYTHON_PRINTER, "Using CONFIG_NN_OPTIMIZED\n");
#endif

I am getting this output:

Using TF_LITE_STATIC_MEMORY
Using ESP_NN
Using CONFIG_NN_OPTIMIZED
MATTYGILO commented 11 months ago

@vikramdattu Quick question are these model size limits in tflite micro? Or could there be any problems with using a large model to do with the arena?

vikramdattu commented 11 months ago

@MATTYGILO in theory there isn't any limit on model size of tflite. It would only be limited by memory availability of the chipset. But again, that should not be the reason for erroneous outputs you see.

MATTYGILO commented 11 months ago

@vikramdattu Thanks for the clarification, just trying to think of what the problem may be.

MATTYGILO commented 11 months ago

I fixed my problem it was an incredibly silly error seeming the depths I went to fix it. Turned out I was entering input index data incorrectly and out of range of my specific model.

MATTYGILO commented 11 months ago

@vikramdattu It would be great for some feedback/contributions for this repo if you can. https://github.com/MATTYGILO/tflite-micropython-esp

Many Thanks Matt

vikramdattu commented 11 months ago

Hi, @MATTYGILO good to hear that you fixed the issue.

I sure will contribute to the repo wherever I can. Thanks for the great repo. 👍