Performance unexplainabilitty for tflite int8 and fp32 models

Dear all, I am testing the performance/throughput of fp32 and quantized models on my platform. My configuration is as follows:

tflite-runtime==2.5.0.post1
tensorflow==1.14.0

*FP32 on CPU

-INFO- Running prediction...
-INFO- Acquired 1 file(s) for model 'MobileNet v1.0'
-INFO- Task runtime: 0:00:28.796083
-INFO- Throughput: 35.8 fps
-INFO- Latency: 29.5 ms
-INFO- Target          Workload        H/W   Prec  Batch Conc. Metric       Score    Units
-INFO- -----------------------------------------------------------------------------------
-INFO- tensorflow_lite mobilenet       cpu   fp32      1     1 throughput    35.8      fps
-INFO- tensorflow_lite mobilenet       cpu   fp32      1     1 latency       29.5       ms
-INFO- Total runtime: 0:00:28.830364
-INFO- Done

INT8 on CPU

google@localhost:~/mlmark$ harness/mlmark.py -c config/tflite-cpu-mobilenet-int8-throughput.json 
-INFO- Running prediction...
-INFO- Acquired 1 file(s) for model 'MobileNet v1.0'
-INFO- Task runtime: 0:01:00.933346
-INFO- Throughput: 16.9 fps
-INFO- Latency: 65. ms
-INFO- Target          Workload        H/W   Prec  Batch Conc. Metric       Score    Units
-INFO- -----------------------------------------------------------------------------------
-INFO- tensorflow_lite mobilenet       cpu   int8      1     1 throughput    16.9      fps
-INFO- tensorflow_lite mobilenet       cpu   int8      1     1 latency       65.        ms
-INFO- Total runtime: 0:01:00.960828
-INFO- Done

Observations: The performance of FP32 model is almost double than INT8 models on CPU, but Google TensorFlow lite benchmarking mentions the opposite: https://www.tensorflow.org/lite/guide/hosted_models#quantized_models

I also tried replacing the models from the models present in above Hosted location, but the harness gives the similar results.

Could you let me know, where it's going wrong?

Thanks Kind Regards Arun

eembc / mlmark

Performance unexplainabilitty for tflite int8 and fp32 models #13