facebookresearch / diffq

DiffQ performs differentiable quantization using pseudo quantization noise. It can automatically tune the number of bits used per weight or group of weights, in order to achieve a given trade-off between model size and accuracy.
Other
234 stars 15 forks source link

Why checkpoint.pth on the output folder is not in compliance with true model size? #11

Closed Eurus-Holmes closed 2 years ago

Eurus-Holmes commented 2 years ago

❓ Questions

For example, when I fine-tune pretrained Vit model with LSQ on CIFAR-10 dataset, the output True model size is 41.20 MB, but on the ./outputs folder, the checkpoint.th is 686 MB, why is not in compliance with true model size?

Screen Shot 2022-03-02 at 20 40 18
adefossez commented 2 years ago

The checkpoint file contains everything required for training. In particular, during training, weights are kept in float32 (this is required as otherwise the small updates from the gradient would never lead to a change of quantized value), as well as all the state of the optimizer (momentum, squared gradients etc). In order to get a shippable small model, you must call at the end of training solver.quantizer.get_quantized_state(). If you torch.save what is returned, it should have the expected model size.

Eurus-Holmes commented 2 years ago

Got it, thanks!