facebookresearch / kill-the-bits

Code for: "And the bit goes down: Revisiting the quantization of neural networks"
Other
636 stars 123 forks source link

inference time? #1

Closed shiyongde closed 5 years ago

shiyongde commented 5 years ago

About compare of inference time ?

pierrestock commented 5 years ago

Hi shiyongde,

Very interesting question indeed! Currently, the file inference.py in both branches (master and mask_r_cnn) performs inference by reconstructing the full network and should be regarded as a proof a concept (as mentioned in the README): with the quantized weights, we manage to get the claimed accuracy.

Future work (requiring some engineering) would be to perform the forward pass with the centroids and the assignments without instantiating the full network. This can be done efficiently by factoring some computations since we use 256 centroids (1 byte) per layer. Note that this work would be hardware-specific. Depending on the future applications (CPU, GPU, other specific hardware), we will possibly release code for doing this.

Thumbs up if you are interested!