Open cb-rep opened 11 months ago
If you plan to finetune the ImageNet pre-trained MAGE on your dataset, you only need to change nb_classes to 47 in main_finetune. The performance can be poor for many reasons -- one reason could be your dataset is too far away from ImageNet image distribution. You could also consider adjusting the training epochs -- if your dataset is much smaller than ImageNet, you should increase the fine-tuning epochs.
If you plan to finetune the ImageNet pre-trained MAGE on your dataset, you only need to change nb_classes to 47 in main_finetune. The performance can be poor for many reasons -- one reason could be your dataset is too far away from ImageNet image distribution. You could also consider adjusting the training epochs -- if your dataset is much smaller than ImageNet, you should increase the fine-tuning epochs.
Excuse me, what is the reason why the accuracy rate can reach 78.6 at the beginning of fine tuning and hardly change during training? Is it because the data set is too small?
I think it is also bottlenecked by the vector quantization, which inevitably loses information about the image.
I think it is also bottlenecked by the vector quantization, which inevitably loses information about the image.
Thank you very much for your reply. Can you give me some ideas? Do I need to retrain VQGAN?
Do you fine-tune it on ImageNet or another dataset? If it is another dataset, then you could first look at the reconstruction quality of VQGAN to see how much information it loses.
您是否在 ImageNet 或其他数据集上对其进行了微调?如果是其他数据集,那么您可以先查看 VQGAN 的重建质量,看看它丢失了多少信息。
Excuse me,This is the picture I reconstructed 1600epoch, do I need to retrain VQGAN?
What does the original image look like?
Unfortunately your uploaded image seems broken and I cannot view it.
不幸的是,您上传的图片似乎已损坏,我无法查看。
sorry
It seems the original image is quite different from the reconstructed one. I would say in this case you should re-train the VQGAN and MAGE on your own dataset.
看起来原始图像与重建的图像有很大不同。在这种情况下,我认为你应该在自己的数据集上重新训练 VQGAN 和 MAGE。
Thank you very much for your help. Have a nice life
If you change the data set, for example now the data set has 47 classes, what else to do but change nb_classes to 47 in main_finetune. Because only modify this final precision is not too high, I am not sure whether the 1000 here vocab_size = codebook_size + 1000 + 1 should be modified, and if modified, it will still report an error: RuntimeError: Error(s) in loading state_dict for VisionTransformerMage: size mismatch for token_emb.word_embeddings.weight: copying a param with shape torch.Size([2025, 768]) from checkpoint, the shape in current model is torch.Size([1072, 768]).