Why is the similarity comparison result of InternViT-300M-448px for images very high

I want to use InternViT-300M-448px, this is the code I asked AI to write. I want to compare and output the similarity between images 1.jpg and 2.jpg. Why is the output similarity very high, about 0.9 or above? No matter if I switch to two images with high or low similarity, the similarity is still greater than 0.9, and it is difficult to see if they are similar. I used other models, such as siglip-so400m and clip ViT-L-14, which output normal similarity. I am a beginner in AI, please forgive me if there are any low-level errors This is the download link for hugginface：https://huggingface.co/OpenGVLab/InternViT-300M-448px/

我想用InternViT-300M-448px，这是我让ai写的代码。我想对于1.jpg和2.jpg的图像对比并输出相似度，为什么输出的相似度非常高，大概0.9以上，无论我换两张相似度很高的或者相似度很低的的图像，相似度仍然大于0.9，看不出来是否相似。我用其他模型，例如：siglip-so400m、clip ViT-L-14，都是输出正常的相似度。我是个ai新手，如果有低级错误，请见谅这是huggingface 的下载地址：https://huggingface.co/OpenGVLab/InternViT-300M-448px/

import torch
from PIL import Image
from transformers import AutoModel, CLIPImageProcessor
from torch.nn.functional import cosine_similarity

# 选择设备
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using device: {device}")

# 加载预训练的模型和图像处理器
model_path = 'E:/InternViT-300M-448px/'
model = AutoModel.from_pretrained(
    model_path,
    torch_dtype=torch.bfloat16,
    low_cpu_mem_usage=True,
    trust_remote_code=True).to(device).eval()

image_processor = CLIPImageProcessor.from_pretrained(model_path)

# 定义一个函数来处理图像并获取特征
def get_image_features(image_path):
    image = Image.open(image_path).convert('RGB')
    pixel_values = image_processor(images=image, return_tensors='pt').pixel_values
    pixel_values = pixel_values.to(torch.bfloat16).to(device)
    with torch.no_grad():
        outputs = model(pixel_values)
    # 假设 pooler_output 是我们想要使用的特征表示
    features = outputs.pooler_output
    return features

# 获取两张图片的特征
features1 = get_image_features('./1.jpg')
features2 = get_image_features('./2.jpg')

# 计算余弦相似度
similarity = cosine_similarity(features1, features2).item()
print(f"The similarity between the two images is: {similarity:.4f}")

输出结果 Output result：

Using device: cuda
The similarity between the two images is: 1.0078

OpenGVLab / InternVL

Why is the similarity comparison result of InternViT-300M-448px for images very high #609