facebookresearch / dinov2

PyTorch code and models for the DINOv2 self-supervised learning method.
Apache License 2.0
8.45k stars 712 forks source link

Test image similarity performance #147

Open wzhiyuan2016 opened 12 months ago

wzhiyuan2016 commented 12 months ago

hi 11 44 22 33

Test model: dinov2 Vitb14 Pre-train.pth The change in similarity of images under different lighting conditions, different shadows, and different lighting conditions Conclusion: It was found that the model is not robust to lighting, shadows, and lighting

qasfb commented 12 months ago

I would say it's expected that similarity decreases with different conditions, but what are the similarity values across images, for comparison ? (in order to have an idea of what the numbers represent)

wzhiyuan2016 commented 12 months ago

I would say it's expected that similarity decreases with different conditions, but what are the similarity values across images, for comparison ? (in order to have an idea of what the numbers represent)

hi I used to calculate the cosine similarity between two images, and the code is as follows: 企业微信截图_16903394778302

Cosine similarity is a method of determining the relationship between two variables, as follows:

cos If the angle between two vector values is smaller, it indicates that the two vectors have strong similarity;

I use the last 384 dimensions as the feature vector A for each image, and then calculate the 384 dimension features of the image under different lighting conditions as the base database. Finally, I use vector A to calculate similarity with the base database, and sort it to obtain the screenshot above.

My idea is that the model should have strong robustness to lighting, and the similarity with the base database should be high, but the similarity is low when there are different periods of sunlight.

Similar to image retrieval, for example, when I search for a car in the bottom database, if the car is exposed to sunlight, it cannot be retrieved.

qasfb commented 12 months ago

Sorry I mean what are the actual number values across the 3 images with different content ?

wzhiyuan2016 commented 12 months ago

Sorry I mean what are the actual number values across the 3 images with different content ?

The actual similarity of the three images should be above 0.98

qasfb commented 12 months ago

what is the actual value between the top-left image vs the bottom-left image in your grid ? also, can you describe what you mean by "the last 384 dimensions" ? are you using the [cls] token ? can you share this grid of images ? I am curious

wzhiyuan2016 commented 12 months ago

what is the actual value between the top-left image vs the bottom-left image in your grid ? also, can you describe what you mean by "the last 384 dimensions" ? are you using the [cls] token ? can you share this grid of images ? I am curious

between the top-left image vs the bottom-left image I did not calculate the similarity, but I have already calculated that the similarity between different images is very low, and this effect can be distinguished between Inter class. is ok!

What I am calculating here is intra class similarity, not inter class similarity

about "the last 384 dimensions" : 企业微信截图_16903628313645

The three line image tests I provided above, for example, looking at the first line of images, the leftmost image is the image to be retrieved, and the right image is the retrieved image, arranged from left to right according to high to low similarity。

If the model performs well, the similarity from left to right should be very high

qasfb commented 12 months ago

essentially what i'm trying to understand is if we are in the case where:

wzhiyuan2016 commented 12 months ago

essentially what i'm trying to understand is if we are in the case where:

  • all images in a given row (with different lighting / shadows) have similarities > 0.4
  • images from different rows have similarities < 0

So how do we train DINOv2 so that the higher the intra class similarity, the lower the inter class similarity? For intra class similarity of 0.4, which is not high, how can we approach the similarity of 0.4 to 0.9?

qasfb commented 12 months ago

"high" compared to what ? Would you be able to provide the similarity scores across images (of different rows) ?

athmanar commented 4 months ago

@wzhiyuan2016 Have you tried using the vitg14(big models) with 1536 output features.?