A bunch of coding tutorials for my Youtube videos on Neural Network Quantization.
This is the first coding tutorial. We take the torchvision
ResNet
model and quantize it entirely from scratch with the PyTorch quantization library, using Eager Mode Quantization.
We discuss common issues one can run into, as well as some interesting but tricky bugs.
TODO
In this tutorial, we do dynamic quantization on a ResNet model. We look at how dynamic quantization works, what the default settings are in PyTorch, and discuss how it differs to static quantization.
In this tutorial series, we use Torch's FX Graph mode quantization to quantize a ResNet. In the first video, we look at the Directed Acyclic Graph (DAG), and see how the fusing, placement of quantstubs and FloatFunctionals all happen automatically. In the second, we look at some of the intricacies of how quantization interacts with the GraphModule. In the third and final video, we look at some more advanced techniques for manipulating and traversing the graph, and use these to discover an alternative to forward hooks, and for fusing BatchNorm layers into their preceding Convs.
In this tutorial we look at how to do Quantization Aware Training (QAT) on an FX Graph Mode quantized Resnet. We build a small trianing lopp with a mini custom data loader. We also generalise the evaluate function we've been using in our tutorials to generalise to other images. We go looking for and find some of the danges of overfit.
In this tutorial, we look at Cross-Layer Equalization, a classic data-free method for improving the quantization of one's models. We use a graph-tracing method to find all of the layers we can do CLE on, do CLE, evaluate the results, and then visualize what's happening inside the model.