Hao840 / OFAKD

PyTorch code and checkpoints release for OFA-KD: https://arxiv.org/abs/2310.19444
90 stars 13 forks source link

Can you provide the distilled models? #2

Closed liguopeng0923 closed 10 months ago

liguopeng0923 commented 11 months ago

Hi @Hao840 ,

Can you provide the distilled models? These are important for reproducing your paper.

Hao840 commented 10 months ago

hi, please give me some details about the experiment that you are reproducing.

liguopeng0923 commented 10 months ago

Thanks for your reply.

  1. I reproduced the experiment of deit-tiny as teacher and resnet18 as student. But it needs more than 100 hours on 4 V100 GPUs, which hinders direct reproduction.
  2. I try to give resnet18 a pretrained weight, but the loss is nan in some epochs.
  3. Totally, it is difficult to reproduce your paper. I hope you can provide some checkpoints that have been distilled for my evaluation.
Hao840 commented 10 months ago

hi @liguopeng0923,

I have uploaded the trained resnet18 student model and corresponding log here, where a DeiT-T model is adopted as the teacher model.

(Sorry for that I fail in retrieving the original checkpoint achieved the reported performance in the paper, which is 71.34%. So I can only provide the one achieved a top-1 accuracy of 71.26% for evaluation.)

liguopeng0923 commented 10 months ago

Thanks very much! But will you be releasing other distilled models? and the other trained models from scratch? They are also important for fair comparisons.

Hao840 commented 10 months ago

Disclosure of assets from company computers (where all the experimental results are stored) requires a complex process, so I'm sorry we have no plan to release more trained models currently.