Closed benjaminkreis closed 5 years ago
Some more information about quantizing models in tensorflow:
https://www.tensorflow.org/performance/quantization
There have been some experiments training at lower bit depths, but the results seem to indicate that you need higher than eight bit to handle the back propagation and gradients. That makes implementing the training more complicated, and so starting with inference made sense. We also already have a lot of float models already that we use and know well, so being able to convert them directly is very convenient.
TensorFlow has production-grade support for eight-bit calculations built in. It also has a process for converting many models trained in floating-point over to equivalent graphs using quantized calculations for inference. For example, here’s how you can translate the latest GoogLeNet model into a version that uses eight-bit computations:
curl http://download.tensorflow.org/models/image/imagenet/inception-2015-12-05.tgz -o /tmp/inceptionv3.tgz
tar xzf /tmp/inceptionv3.tgz -C /tmp/
bazel build tensorflow/tools/graph_transforms:transform_graph
bazel-bin/tensorflow/tools/graph_transforms/transform_graph \
--in_graph=/tmp/classify_image_graph_def.pb \
--outputs="softmax" --out_graph=/tmp/quantized_graph.pb \
--transforms='add_default_attributes strip_unused_nodes(type=float, shape="1,299,299,3")
remove_nodes(op=Identity, op=CheckNumerics) fold_constants(ignore_errors=true)
fold_batch_norms fold_old_batch_norms quantize_weights quantize_nodes
strip_unused_nodes sort_by_execution_order'
This will produce a new model that runs the same operations as the original, but with eight bit calculations internally, and all weights quantized as well. If you look at the file size, you’ll see it’s about a quarter of the original (23MB versus 91MB). You can still run this model using exactly the same inputs and outputs though, and you should get equivalent results. Here’s an example:
# Note: You need to add the dependencies of the quantization operation to the
# cc_binary in the BUILD file of the label_image program:
#
# //tensorflow/contrib/quantization:cc_ops
# //tensorflow/contrib/quantization/kernels:quantized_ops
bazel build tensorflow/examples/label_image:label_image
bazel-bin/tensorflow/examples/label_image/label_image \
--image=<input-image> \
--graph=/tmp/quantized_graph.pb \
--labels=/tmp/imagenet_synset_to_human_label_map.txt \
--input_width=299 \
--input_height=299 \
--input_mean=128 \
--input_std=128 \
--input_layer="Mul:0" \
--output_layer="softmax:0"
It would be easy to reduce the precision of the inputs only and still represent them with high precision in Keras or TF. However, it would be interesting if we could train reduced precision weights, too.
@violatingcp pointed out some precision options https://www.tensorflow.org/versions/r0.12/api_docs/python/framework/tensor_types https://github.com/fchollet/keras/issues/2019 but we haven't yet found a way to do this with arbitrary precision.
We may also find something interesting in the binarized network implementation: https://gitlab.com/kunglab/ddnn