LTH14 / mage

A PyTorch implementation of MAGE: MAsked Generative Encoder to Unify Representation Learning and Image Synthesis
MIT License
507 stars 26 forks source link

some questions about changing the image classification data set #45

Open cb-rep opened 10 months ago

cb-rep commented 10 months ago

If you change the data set, for example now the data set has 47 classes, what else to do but change nb_classes to 47 in main_finetune. Because only modify this final precision is not too high, I am not sure whether the 1000 here vocab_size = codebook_size + 1000 + 1 should be modified, and if modified, it will still report an error: RuntimeError: Error(s) in loading state_dict for VisionTransformerMage: size mismatch for token_emb.word_embeddings.weight: copying a param with shape torch.Size([2025, 768]) from checkpoint, the shape in current model is torch.Size([1072, 768]).

LTH14 commented 10 months ago

If you plan to finetune the ImageNet pre-trained MAGE on your dataset, you only need to change nb_classes to 47 in main_finetune. The performance can be poor for many reasons -- one reason could be your dataset is too far away from ImageNet image distribution. You could also consider adjusting the training epochs -- if your dataset is much smaller than ImageNet, you should increase the fine-tuning epochs.