KaiyangZhou / deep-person-reid

Torchreid: Deep learning person re-identification in PyTorch.
https://kaiyangzhou.github.io/deep-person-reid/
MIT License
4.25k stars 1.14k forks source link

Using the cosine similarity between embeddings generated by OsNET-Ain doesn't provide good results, HELP !! #568

Open IGlace opened 8 months ago

IGlace commented 8 months ago

Hello,

I'm working on a project that use graph neural networks, using embedding of bounding boxes images of persons and their similarity. The model i use is OSNet-AIN, because according to the researches is the one that provide the best results in the REID field, and the method i use to calculate the similarity between images is simply the cosine similarity by calculating the similarity between the embedding and use that as input to the graph.

My problem is the similarity results are not logical, i did a simple test where i took two pictures of person 1 from front and behind and a picture of person 2 from the front, i extract the exact bounding boxes of the person in the 3 images, and i extracted the embedding from those bounding boxes using OSNet-AIN model, and finally i calculated the similarity between the embedding of person 1 from the front and person 2 from the front, and similarity of person 1 from the front and person 1 from the back.

The expected result following the concept of REID models, is that the second experiment of person 1 from both sides should have more similarity that person 1 and person 2 in the same side, but that's not the result i received, in fact i got that person 1 and person 2 are more similar just because their picture got taken from the same side, compared to the similarity between person 1 from both side, which is not logical, i would like someone to explain to me this situation, or I'm doing something wrong by comparing the similarity using cosine alone, is there something I'm missing, you can check the code I'm using for the experiment below.

import torchreid
from torchreid import utils
from torchreid import models
from scipy.spatial import distance
from torchreid.utils import FeatureExtractor, load_pretrained_weights
import os
import cv2
import time
from torchvision import transforms
import torch
import torchvision.transforms as T

extractor = FeatureExtractor(
    model_name = 'osnet_ain_x1_0',
    device = 'cpu' 
)

image_size=(256, 128)
transforms = []
transforms += [T.ToPILImage()]
transforms += [T.Resize(image_size)]
transforms += [T.ToTensor()]
preprocess = T.Compose(transforms)

img1 = cv2.imread(os.path.join("image_1.jpg"))
img1 = cv2.cvtColor(img1, cv2.COLOR_BGR2RGB)
img1 = preprocess(img1)
img1 = img1.unsqueeze(0)

img2 = cv2.imread(os.path.join("image_2.jpg"))
img2 = cv2.cvtColor(img2, cv2.COLOR_BGR2RGB)
img2 = preprocess(img2)
img2 = img2.unsqueeze(0)

concat_tensor = torch.cat([img1, img2])
with torch.no_grad():
   features = extractor(concat_tensor)

print("Similarity : ", torch.nn.functional.cosine_similarity(features.data[0], features.data[1], dim=0))
QiqLiang commented 1 month ago

Maybe you need to normalize the input