Closed ebgoldstein closed 3 years ago
Steps:
note that currently our model uses float 16 quantization (and another model uses Dynamics range quantization). Full integer reqs some code tweaks, and a representative dataset...
one potential idea is the do full integer quant. and just omit quantizing the output.. i..e, include:
converter.inference_input_type = tf.int8 # or tf.uint8
but omit:
converter.inference_output_type = tf.int8 # or tf.uint8
the model would run on the TPU except for the last layer, which would run on the CPU..
I'm going to close this for now — i think 4s per inference is pretty good, and 8-bit quant seems like it will be challenging to do for a regression model..
compile for edge TPU: https://coral.ai/docs/edgetpu/compiler/
And see how much faster the model is compared to the fp16 model (4s per inference)