result base on clip - Githubissues

yinjiexidian commented 2 months ago

Hello, I would like to ask if you have done experiments on the feature map of clip. Intuitively, the results on clip should be better than those on dino, but I got suboptimal results. I am not sure if there is a problem with my code or there are other reasons. I hope to get your answer. Thank you!

tds-fchiaroni commented 2 months ago

Hello,

Thank you for your question. We did not perform experiments with the CLIP feature extractor, though it could indeed be an interesting alternative. In our work, we applied our PIM model on top of feature maps generated using the GCD feature extractor. Please note that the GCD feature extractor is based on DINO, but it was also fine-tuned on the target dataset using a semi-supervised contrastive loss, as detailed in Section 3.1.1 of the GCD article.

This fine-tuning process may account for the performance differences you observed. Did you fine-tune the CLIP weights before applying our method to the resulting feature maps? That could potentially help bridge the gap.

Best regards.

yinjiexidian commented 2 months ago

Hello,

Thank you for your question. We did not perform experiments with the CLIP feature extractor, though it could indeed be an interesting alternative. In our work, we applied our PIM model on top of feature maps generated using the GCD feature extractor. Please note that the GCD feature extractor is based on DINO, but it was also fine-tuned on the target dataset using a semi-supervised contrastive loss, as detailed in Section 3.1.1 of the GCD article.

This fine-tuning process may account for the performance differences you observed. Did you fine-tune the CLIP weights before applying our method to the resulting feature maps? That could potentially help bridge the gap.

Best regards.

Thank you very much for your reply. I tried the following:

I replaced the GCD pre-trained model vit_dino_b16 with the more powerful CLIP pre-trained model vit_B_16, and kept everything else the same. I used the same loss to fine-tune the model, and indeed observed performance improvements on the GCD benchmark.

The PIM method is based on GCD training, but I observed that the results obtained on PIM based on CLIP features are slightly weaker than DINO. This result confuses me and I hope to get your insights.

Here are some of my experimental results:

	cifar10			cifar100			cub			ImageNet-100
	All	Old	New	All	Old	New	All	Old	New	All	Old	New
gcd	91.5	97.9	88.2	70.8	77.6	57	51.3	56.6	48.7	74.1	89.8	66.3
gcd_clip	94.6	97.5	93.1	71.9	77.8	60.2	52.5	58.6	49.5	75.2	88.6	68.5
pim_dino	94.7	97.4	93.3	78.3	84.2	66.5	62.7	75.7	56.2	83.1	95.3	77
pim_clip	94.4	97.1	93.1	73.7	82.9	55.3	58.7	70.8	52.6	80.7	95.9	73

tds-fchiaroni commented 1 month ago

Hello,

Thank you for sharing these interesting experiments! The observed performance differences could be attributed to several factors, such as variations in the optimization process between the two methods, or the fixed temperature parameter value we used during the conversion of logits into softmax predictions, which might be sub-optimal when using CLIP weights. While PIM still outperforms GCD, adjusting the temperature parameter might further enhance PIM's performance with CLIP. I hope this helps!

Best regards, Florent

tds-fchiaroni commented 1 month ago

Hello,

Since we’ve addressed this topic, I’ll go ahead and close the issue. If you have any further questions, please feel free to open a new one.

Best regards, Florent

ThalesGroup / pim-generalized-category-discovery

result base on clip #8