Different results between inference on CPU and TPU

Description

I was trying to implement the code needed to run the MLPerf-Tiny benchmark suite's networks on the coral micro board.

After finding the problem related to serial communication (described in issue #116) i was able to write the code, send the input values from the pc to the board and run the inference. I found that while the inference results on cpu are correct, the inference results on TPU are very wrong. In the sense that not only do the results vary a little, but the performance of the benchmark goes from about 80% accuracy to 10% accuracy.

Using as an example the ResNet used in the ImageClassification benchmark: The original network is (image generated using Netron):

This network was created, trained and quantized using the scripts provided in the MLPerf-tiny benchmark repo/training folder

The network, after the compilation using edgetpu-compiler is:

Using as an example the file the first test image (an emu): Figure_1 the results are: CPU: [0.000,0.000,0.664,0.281,0.051,0.000,0.004,0.000,0.000,0.000] TPU: [0.000,0.000,0.000,0.004,0.000,0.621,0.000,0.371,0.000,0.000]

label_names: ['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']

so the CPU recognize that the image depicts a bird while the TPU return very different and wrong results.

I found other issue describing the same problem and I decided to try to run some layers of the network on the CPU in order to find out which one is the cause of these different results. I tried splitting the model using the -i flag on both the output tensors of the add layers. Both the times the inference was completed but again with wrong results.

I then tried to split the model before the first add layer., specifying the output of the first Conv2D layer, obtaining the network below.

This time the inference hanged the code and no result was provided. ( The problem is in the inference phase because i turn on the user led before calling interpreter.Invoke() and turn off it as soon as the inference ends).

Additional informations:

Since the code used to parse inputs coming from the serial interface and to save them in the input tensor is the same for both CPU and TPU networks I don't think the issue can be there.
Same for the results
For the CPU model i added to the MicroMutableOpResolver the 7 operations needed. For the TPU i tried adding only the customOp and adding the customOp and all the 7 operation used by the CPU
Using the splitted model i added to the resolver the customOp and the other 7 operations.
kTensorArenaSize is sufficently large to be able to run the full model on the CPU. For the splitted network i kept the same value.
The TPU is turned on and the tpu_context does not run out of scope. In fact, the white led won't turn off until I reboot the board.
No error message is printed on the serial interface. Probably because the code crash somehow and the consoleTask stops running.

Click to expand!

### Issue Type Support ### Operating System Ubuntu ### Coral Device Dev Board Micro ### Other Devices _No response_ ### Programming Language C++ ### Relevant Log Output _No response_

google-coral / coralmicro

Different results between inference on CPU and TPU #117

Description