Quantization for speeding up

Xilinx / brevitas

Brevitas: neural network quantization in PyTorch

Other

1.2k stars 196 forks source link

Hi, thanks for your code and could you help me with the following question? I have incorporated your provided layers to a DenseUNET model, I have:

conv = qnn.QuantConv2d(in_channels=params['num_channels'], out_channels=params['num_filters'], kernel_size=( params['kernel_h'], params['kernel_w']), padding=(padding_h, padding_w), stride=params['stride_conv'], weight_quant_type=QuantType.INT, weight_bit_width=8)

batchnorm = qnn.BatchNorm2dToQuantScaleBias(num_features=params['num_channels'], weight_quant_type=QuantType.INT, weight_bit_width=8)

relu = qnn.QuantReLU(quant_type=QuantType.INT, bit_width=8, max_val=6)

sigmoid = qnn.QuantSigmoid(bit_width=8, quant_type=QuantType.INT)

Those functions replaced by qnn, I did not change anything else, the model can be successfully trained but seemed the running time with GPU and CPU is actually slower than pytorch nn implementation. Did I do anything wrong? Should the model have the speed up the training and inference about 4x times?

Hello,

Glad yo hear about your good training results. However, Brevitas is a library oriented towards research on quantization-aware (re)training, it doesn't take care of deployment. It's up to the user to export a trained model to some kind of optimized hw+sw backend. Our main open source backend (currently being developed) is FINN, which deploys quantized models as custom dataflow architectures on FPGAs. The fact that inference is slower than torch.nn is expected, as quantization-aware operations involves exposing a differentiable integer-only datapath on top of floating point, which can be expensive. You might want to consider moving to Pytorch official quantization tools. They won't be as good in terms of accuracy, but deployment to CPU/GPU is easier.

Alessandro

Xilinx / brevitas

Quantization for speeding up #52