Closed Aaron4Fun closed 5 years ago
See my reply in #4
This paper and thus the implementation focuses more on the algorithm than on the actual deployment to specific hardware. Thus all parameters are still 32-bit floats but the amount of unique values used is artificially limited to what an N-bit value could use.
See also this issue in the implementation of the authors of the paper: AojunZhou/Incremental-Network-Quantization#36
After the quantization, the weight parameters are converted into INT8 version and thus the parameter size would be smaller. So, why does my output weight become bigger (75.2 MB to 96.2 MB)? Thanks in advance.