YOLO v3 INT8 inference in TensorFlow Lite

AlexeyAB / yolo2_light

Light version of convolutional neural network Yolo v3 & v2 for objects detection with a minimum of dependencies (INT8-inference, BIT1-XNOR-inference)

MIT License

301 stars 116 forks source link

YOLO v3 INT8 inference in TensorFlow Lite #50

Open anferico opened 5 years ago

anferico commented 5 years ago

Hello, Is it possible to obtain a quantized .tflite version of YOLO v3 / YOLO Tiny v3 to do INT8 inference with the tools in this repository? I've tried using TensorFlow Lite's official tool, toco, but it seems that some layers don't support quantization.

ambr89 commented 5 years ago

Hi! Yes, I've obtain a Quantized .tflite

$ bazel run tensorflow/lite/toco:toco -- \
    --input_file=mymodel.pb \
    --output_file=output.tflite \
    --input_shapes=1,416,416,3 \
    --input_arrays='input_1' \
    --output_format=TFLITE \
    --output_arrays='output_0','output_1' \
    --inference_type=QUANTIZED_UINT8 \
    --std_dev_values=128 --mean_values=128 \
    --default_ranges_min=-6 --default_ranges_max=6 \
    --change_concat_input_ranges=false \
    --allow_custom_ops

BUT I don't understand how use it. I've received a RuntimeWarning: overflow encountered in exp during my post elaboration Have you some idea?

kolingv commented 5 years ago

what u have done may be called "dummy quantization" - i.e. only test the tool and won't do anything abt quantization. For uint8 quantization using toco, currently, u may need to consider 'quantization-aware training' from tensorflow (google it for some insights). It inserts quantizaiton layers measuring min/max of some tensors and then simulates quantization error during training. After training, freeze the graph with checkpoint, convert it to tflite, u get it!