analogdevicesinc / ai8x-training

Model Training for ADI's MAX78000 and MAX78002 Edge AI Devices
Apache License 2.0
89 stars 80 forks source link

QAT #323

Closed fzh-adham closed 3 weeks ago

fzh-adham commented 3 months ago

hi there according to the documentation https://github.com/analogdevicesinc/ai8x-training#quantization-aware-training-qat we can use either QAT or post quantization but can I use both of them? if yes for using both of them after training which file should I choose qat-checkpoint or qat-best?

while is training it is writing "save checkpoints to qat_checkpoint.pth.tar" but in the documentation is written we have to choose qat-best.pth.tar.

ermanok commented 3 months ago

Hello,

QAT is a method to train a model while optimizing the performance of its quantized version. So at QAT, all the model parameters are stored as floating points but the forward pass of the model is executed after quantizing the parameters. At the model synthesis stage, you still have to quantize the model checkpoint whether you use QAT or not. This is why you can observe two scripts for each pretrained model in our [synthesis repo] (https://github.com/analogdevicesinc/ai8x-synthesis/tree/develop/scripts); the first one to quantize the model and the second one to synthesize the model for the MCU.

We always advice you to train the models using QAT as the quantized model's performance is always better with respect to the models trained without QAT. In our framework, two model checkpoints are saved to the disk. 'qat_checkpoint.pth.tar' is the checkpoint fie that is saved at the last epoch of the training where 'qat_best.pth.tar' is for the epoch when the validation set accuracy is maximum. So it is recommended to use 'qat_best.pth.tar' in our documentation.

fzh-adham commented 2 months ago

thanks for reply according to the scheduler file for quantization based on the example we start to quantize from epoch 240 so there would be a question, which checkpoint file should I synthesize? consider that I want to choose the checkpoint which has been trained with QAT . can I set it from epoch 1?

ermanok commented 2 months ago

There are 4 checkpoint files in the log folder. checkpoint.pth.tar & best.pth.tar are related to the floating point model training. These files are updated until QAT starts (epoch 240 for your case). After QAT is initiated, only the qat_checkpoint.pth.tar & qat_best.pth.tar are updated. Therefore, if you want to use QAT models, you need to use the model checkpoints with qat prefix.

fzh-adham commented 2 months ago

hi thanks a lot for your clear answer;))

github-actions[bot] commented 1 month ago

This issue has been marked stale because it has been open for over 30 days with no activity. It will be closed automatically in 10 days unless a comment is added or the "Stale" label is removed.