LTH14 / mage

A PyTorch implementation of MAGE: MAsked Generative Encoder to Unify Representation Learning and Image Synthesis
MIT License
527 stars 25 forks source link

some questions about changing the image classification data set #45

Open cb-rep opened 11 months ago

cb-rep commented 11 months ago

If you change the data set, for example now the data set has 47 classes, what else to do but change nb_classes to 47 in main_finetune. Because only modify this final precision is not too high, I am not sure whether the 1000 here vocab_size = codebook_size + 1000 + 1 should be modified, and if modified, it will still report an error: RuntimeError: Error(s) in loading state_dict for VisionTransformerMage: size mismatch for token_emb.word_embeddings.weight: copying a param with shape torch.Size([2025, 768]) from checkpoint, the shape in current model is torch.Size([1072, 768]).

LTH14 commented 11 months ago

If you plan to finetune the ImageNet pre-trained MAGE on your dataset, you only need to change nb_classes to 47 in main_finetune. The performance can be poor for many reasons -- one reason could be your dataset is too far away from ImageNet image distribution. You could also consider adjusting the training epochs -- if your dataset is much smaller than ImageNet, you should increase the fine-tuning epochs.

18222why commented 1 month ago

If you plan to finetune the ImageNet pre-trained MAGE on your dataset, you only need to change nb_classes to 47 in main_finetune. The performance can be poor for many reasons -- one reason could be your dataset is too far away from ImageNet image distribution. You could also consider adjusting the training epochs -- if your dataset is much smaller than ImageNet, you should increase the fine-tuning epochs.

Excuse me, what is the reason why the accuracy rate can reach 78.6 at the beginning of fine tuning and hardly change during training? Is it because the data set is too small?

LTH14 commented 1 month ago

I think it is also bottlenecked by the vector quantization, which inevitably loses information about the image.

18222why commented 1 month ago

I think it is also bottlenecked by the vector quantization, which inevitably loses information about the image.

Thank you very much for your reply. Can you give me some ideas? Do I need to retrain VQGAN?

LTH14 commented 1 month ago

Do you fine-tune it on ImageNet or another dataset? If it is another dataset, then you could first look at the reconstruction quality of VQGAN to see how much information it loses.

18222why commented 1 month ago

您是否在 ImageNet 或其他数据集上对其进行了微调?如果是其他数据集,那么您可以先查看 VQGAN 的重建质量,看看它丢失了多少信息。

00015 Excuse me,This is the picture I reconstructed 1600epoch, do I need to retrain VQGAN?

LTH14 commented 1 month ago

What does the original image look like?

LTH14 commented 1 month ago

Unfortunately your uploaded image seems broken and I cannot view it.

18222why commented 1 month ago

不幸的是,您上传的图片似乎已损坏,我无法查看。

tb0092 sorry

LTH14 commented 1 month ago

It seems the original image is quite different from the reconstructed one. I would say in this case you should re-train the VQGAN and MAGE on your own dataset.

18222why commented 1 month ago

看起来原始图像与重建的图像有很大不同。在这种情况下,我认为你应该在自己的数据集上重新训练 VQGAN 和 MAGE。

Thank you very much for your help. Have a nice life