Closed christ115 closed 5 years ago
Recently I spend main time to optimize the inference of neural network on the base of Synet and Simd Library. Of course I know about model quantization and have corresponding plans. But these are medium term plans. In the nearest future I'm going to add support of NHWC data format (native data format for Tensorflow) to Synet and optimizations for ARM NEON.
Just for your information.
In the near two years, model quantization is becoming popular in deep learning, especially for mobile devices. The core of quantization is low-bits (usually in 8-bits) convolution, which can reduce the memory bandwidth and run-time of deep network significantly. Many deep learning frameworks have implemented related features, e.g.,
Do you have plan to add this feature into SimdLib?