facebookresearch / deit

Official DeiT repository
Apache License 2.0
4k stars 548 forks source link

Distillation with different number of classes #124

Closed feral913 closed 2 years ago

feral913 commented 2 years ago

Hello,

I want to train a distilled model with my custom data set. But my dataset has fewer classes than ImageNet. Therefore, loading a pretrained regnety model into a newly generated model with a fewer number of classes will result in an error.

o command: $ python main.py --teacher-path </path/to/regnety~.pth> --distillation-type soft --data-path </path/to/mydata> --output_dir </path/to/output>

o error message-1: size mismatch for head.fc.weight: copying a param with shape torch.Size([1000, 3024]) from checkpoint, the shape in current model is torch.Size([392, 3024]). size mismatch for head.fc.bias: copying a param with shape torch.Size([1000]) from checkpoint, the shape in current model is torch.Size([392]).

o error message-2 (if I set the number of classes in the regnety model to 1000): When knowledge distillation is enabled, the model is expected to return a Tuple[Tensor, Tensor] with the output of the class_token and the dist_token.

(I can finetune with imagenet pretrained model and my data set.)

Can I train distilled model with my data? Or do I need to make new regnety model with my data?

Thanks in advance.

TouvronHugo commented 2 years ago

Hi @feral913 , Thanks for your question, It is necessary to finetune our RegNet on your data or train a teacher from scratch on your data. The number of classes of the teacher must correspond to the one of the dataset. Best, Hugo

feral913 commented 2 years ago

Hello @TouvronHugo ,

Thank you for your answer, I will try it.