ThalesGroup / pim-generalized-category-discovery

Code for our ICCV 2023 paper "Parametric Information Maximization for Generalized Category Discovery"
MIT License
14 stars 1 forks source link

result base on clip #8

Open yinjiexidian opened 3 weeks ago

yinjiexidian commented 3 weeks ago

Hello, I would like to ask if you have done experiments on the feature map of clip. Intuitively, the results on clip should be better than those on dino, but I got suboptimal results. I am not sure if there is a problem with my code or there are other reasons. I hope to get your answer. Thank you!

tds-fchiaroni commented 3 weeks ago

Hello,

Thank you for your question. We did not perform experiments with the CLIP feature extractor, though it could indeed be an interesting alternative. In our work, we applied our PIM model on top of feature maps generated using the GCD feature extractor. Please note that the GCD feature extractor is based on DINO, but it was also fine-tuned on the target dataset using a semi-supervised contrastive loss, as detailed in Section 3.1.1 of the GCD article.

This fine-tuning process may account for the performance differences you observed. Did you fine-tune the CLIP weights before applying our method to the resulting feature maps? That could potentially help bridge the gap.

Best regards.

yinjiexidian commented 3 weeks ago

Hello,

Thank you for your question. We did not perform experiments with the CLIP feature extractor, though it could indeed be an interesting alternative. In our work, we applied our PIM model on top of feature maps generated using the GCD feature extractor. Please note that the GCD feature extractor is based on DINO, but it was also fine-tuned on the target dataset using a semi-supervised contrastive loss, as detailed in Section 3.1.1 of the GCD article.

This fine-tuning process may account for the performance differences you observed. Did you fine-tune the CLIP weights before applying our method to the resulting feature maps? That could potentially help bridge the gap.

Best regards.

Thank you very much for your reply. I tried the following:

I replaced the GCD pre-trained model vit_dino_b16 with the more powerful CLIP pre-trained model vit_B_16, and kept everything else the same. I used the same loss to fine-tune the model, and indeed observed performance improvements on the GCD benchmark.

The PIM method is based on GCD training, but I observed that the results obtained on PIM based on CLIP features are slightly weaker than DINO. This result confuses me and I hope to get your insights.

Here are some of my experimental results:

cifar10 cifar100 cub ImageNet-100
All Old New All Old New All Old New All Old New
gcd 91.5 97.9 88.2 70.8 77.6 57 51.3 56.6 48.7 74.1 89.8 66.3
gcd_clip 94.6 97.5 93.1 71.9 77.8 60.2 52.5 58.6 49.5 75.2 88.6 68.5
pim_dino 94.7 97.4 93.3 78.3 84.2 66.5 62.7 75.7 56.2 83.1 95.3 77
pim_clip 94.4 97.1 93.1 73.7 82.9 55.3 58.7 70.8 52.6 80.7 95.9 73
tds-fchiaroni commented 5 days ago

Hello,

Thank you for sharing these interesting experiments! The observed performance differences could be attributed to several factors, such as variations in the optimization process between the two methods, or the fixed temperature parameter value we used during the conversion of logits into softmax predictions, which might be sub-optimal when using CLIP weights. While PIM still outperforms GCD, adjusting the temperature parameter might further enhance PIM's performance with CLIP. I hope this helps!

Best regards, Florent