Using the cosine similarity between embeddings generated by OsNET-Ain doesn't provide good results, HELP !!

Hello,

I'm working on a project that use graph neural networks, using embedding of bounding boxes images of persons and their similarity. The model i use is OSNet-AIN, because according to the researches is the one that provide the best results in the REID field, and the method i use to calculate the similarity between images is simply the cosine similarity by calculating the similarity between the embedding and use that as input to the graph.

My problem is the similarity results are not logical, i did a simple test where i took two pictures of person 1 from front and behind and a picture of person 2 from the front, i extract the exact bounding boxes of the person in the 3 images, and i extracted the embedding from those bounding boxes using OSNet-AIN model, and finally i calculated the similarity between the embedding of person 1 from the front and person 2 from the front, and similarity of person 1 from the front and person 1 from the back.

The expected result following the concept of REID models, is that the second experiment of person 1 from both sides should have more similarity that person 1 and person 2 in the same side, but that's not the result i received, in fact i got that person 1 and person 2 are more similar just because their picture got taken from the same side, compared to the similarity between person 1 from both side, which is not logical, i would like someone to explain to me this situation, or I'm doing something wrong by comparing the similarity using cosine alone, is there something I'm missing, you can check the code I'm using for the experiment below.

import torchreid
from torchreid import utils
from torchreid import models
from scipy.spatial import distance
from torchreid.utils import FeatureExtractor, load_pretrained_weights
import os
import cv2
import time
from torchvision import transforms
import torch
import torchvision.transforms as T

extractor = FeatureExtractor(
    model_name = 'osnet_ain_x1_0',
    device = 'cpu' 
)

image_size=(256, 128)
transforms = []
transforms += [T.ToPILImage()]
transforms += [T.Resize(image_size)]
transforms += [T.ToTensor()]
preprocess = T.Compose(transforms)

img1 = cv2.imread(os.path.join("image_1.jpg"))
img1 = cv2.cvtColor(img1, cv2.COLOR_BGR2RGB)
img1 = preprocess(img1)
img1 = img1.unsqueeze(0)

img2 = cv2.imread(os.path.join("image_2.jpg"))
img2 = cv2.cvtColor(img2, cv2.COLOR_BGR2RGB)
img2 = preprocess(img2)
img2 = img2.unsqueeze(0)

concat_tensor = torch.cat([img1, img2])
with torch.no_grad():
   features = extractor(concat_tensor)

print("Similarity : ", torch.nn.functional.cosine_similarity(features.data[0], features.data[1], dim=0))

KaiyangZhou / deep-person-reid

Using the cosine similarity between embeddings generated by OsNET-Ain doesn't provide good results, HELP !! #568