Closed leankemski closed 2 years ago
This method is to train the network with float32 to achieve higher accuracy, and use int8 to store on the FPGA chip to save storage space. The ‘overlay.memory.loadweight’ function will multiply the weight with quant_scale and convert it to int16 and then pass it to the FPGA chip. It is defined in line 2 of FPGA_CNN_INT8.ipynb.
I find that the file mnist_cnn_model_int8.h5 is same as the mnist_cnn_model.h5, and the weight in it is still float32. Don’t you need to quantify the model to int8? In the FPGA_CNN_INT8.ipynb, you use the overlay.memory.loadweight to load the weight to on-chip memory. Will the method automatically change the type from float32 to int6 by multiplying the quant_scale? Where can I know the actual code in the overlay.memory.loadweight instead of the API?