inference time？ - Githubissues

Hi shiyongde,

Very interesting question indeed! Currently, the file inference.py in both branches (master and mask_r_cnn) performs inference by reconstructing the full network and should be regarded as a proof a concept (as mentioned in the README): with the quantized weights, we manage to get the claimed accuracy.

Future work (requiring some engineering) would be to perform the forward pass with the centroids and the assignments without instantiating the full network. This can be done efficiently by factoring some computations since we use 256 centroids (1 byte) per layer. Note that this work would be hardware-specific. Depending on the future applications (CPU, GPU, other specific hardware), we will possibly release code for doing this.

Thumbs up if you are interested!

facebookresearch / kill-the-bits

inference time？ #1