Open changtimwu opened 7 years ago
please start from https://www.tensorflow.org/performance/quantization
tensorflow supports fp16 https://github.com/tensorflow/tensorflow/issues/1300
cifar10 example https://github.com/tensorflow/models/tree/master/tutorials/image/cifar10/
Thanks for sharing this information. I would really like to test inference when models are limited to FP16. Unfortunately, I've heard bad things about FP16 support on Pascal. Apparently the only chip that guaranties native FP16 operations is the Tesla P100. We the mortals can only afford a GTX 1080, which seems unable to achieve the theoretical throughput. Then again, I have not tested it myself or found any evidence other than comments on reddit and github
Nvidia has quite complete solution on 8 bit inference.
Nvidia's two days demo is worthy trying.
mixed precision CUDA programming highlights: GP102(1080ti) is much better on FP16 than GP104(1080).
related discussion
volta is even better on FP16.