Question about coreset selection

HuangOwen / QAT-ACS

Official PyTorch implementation of paper "Efficient Quantization-aware Training with Adaptive Coreset Selection"

MIT License

25 stars 2 forks source link

Question about coreset selection #2

Open zhangxin-xd opened 1 year ago

zhangxin-xd commented 1 year ago

Thank you for sharing your excellent work! I have a question about coreset selection. I noticed that in Algorithm 1, all the samples are re-sorted according to dACS and then reconstituted in the subset. It appears that the coreset selection is dynamic, akin to dropping out some unimportant samples during the training phase (the dropped ones can be reselected). However, some of the comparison methods are static (the dropping is permanent). Is the comparison reasonable?

I'm looking forward to your reply!

HuangOwen commented 1 year ago

Thanks for the question and your interest in our work! We believe that this comparison is reasonable as all the coreset method has the same "coreset data fraction per epoch". Since our target is to improve the training efficiency, the training time reduction is the same across different methods. In addition, previous work [1][2] also adopts a similar adaptive coreset strategy and compares it with other fixed-coreset methods. [1] Adaptive second order coresets for data-efficient machine learning, ICML 2022 [2] RETRIEVE: Coreset Selection for Efficient and Robust Semi-Supervised Learning, NeurIPS 2021

zhangxin-xd commented 1 year ago

Thanks for your reply!!

zhangxin-xd commented 1 year ago

Hi, I have another question about the Error Vector Score which acts like EL2N. In EL2N, . The average is calculated across several models with different initialization weights. In your ACS, . When QAT starts, the weights at time t are fixed so how is the average calculated?

HuangOwen commented 1 year ago

Thanks for your question. Different from the GraNd score proposed in EL2N, the expectation of our ACS is computed on all logits $m \in M$ at a given training time $t$ (It is an average of these gradients instead of the sum). Since we then use $d_{\text{EVS}}$ to approximate it, the analysis still holds. We will correct the expectation equation in our manuscript later to avoid confusion.

zhangxin-xd commented 1 year ago

Got that! Thanks for the reply!