Cysu / dgd_person_reid

Domain Guided Dropout for Person Re-identification
http://arxiv.org/abs/1604.07528
231 stars 94 forks source link

How to use the pretrained JSTL+DGD model for person re-identification? #14

Open benstaf opened 7 years ago

benstaf commented 7 years ago

I don't understand how to do person re-identification with the pretrained JSTL+DGD model found here: https://drive.google.com/open?id=0B67_d0rLRTQYZnB5ZUZpdTlxM0k

I have two problems, one related to input, one related to output :

  1. In person re-identification, we input two different pictures, and we ask the model if they depict the same person or not.

But here, in the file 'jstl_dgd_deploy_inference.prototxt', the input data is (1,3,144,56) and not, for example, (2,3,144,56).

  1. In the file 'jstl_dgd_deploy_inference.prototxt', I don't see the output layer, it should be a binary softmax, with output '1' if the two photos represent the same person, and '0' if the persons are different.

Moreover, when loading the caffemodel weights, I receive the warnings:

I1108 08:48:31.324759 1525 net.cpp:752] Ignoring source layer relu7 I1108 08:48:31.324795 1525 net.cpp:752] Ignoring source layer drop7 I1108 08:48:31.324802 1525 net.cpp:752] Ignoring source layer fc8_jstl

This suggests that something is missing in the prototxt file.

Cysu commented 7 years ago

Our model does not directly produces the binary verification result of a pair of people. During test stage, we first go through all the images, and extract their features using our net. Then we compute the pairwise Euclidean distances between query and gallery people. At last, for each query, we just rank the gallery samples based on their distances.

If you just wish to do the verification, you can choose a distance threshold that balances the true positive rate and false positive rate.

benstaf commented 7 years ago

I tried to follow your suggestions, but my result is not convincing. I made some experiments with the PRID dataset.

In the multi shot case, I choose 2 pictures of persons 4 and 9, taken with cameras A and B (8 pictures in total).

We should get a large distance between pictures of different persons, and a small distance between pictures of the same person, but this is not the case. Why?

Some results are here (for example, a4_1.png is picture number 1 of person 4 by camera A):

distance between a4_1.png and a9_1.png: 6.65493 distance between a4_1.png and a4_34.png: 6.5565 distance between a4_1.png and a9_28.png: 4.84618 distance between a4_1.png and b4_1.png: 7.06474 distance between a4_1.png and b9_1.png: 8.09637 distance between a4_1.png and b4_34.png: 5.71222 distance between a4_1.png and b9_28.png: 5.91796 distance between b9_1.png and a4_34.png: 9.21853 distance between b9_1.png and a9_28.png: 7.02944 distance between b9_1.png and b4_1.png: 4.23969 distance between b9_1.png and b9_28.png: 5.4921

Cropped images are here (cropped to shape (56,144) for input in the neural network): https://drive.google.com/drive/folders/0B86WKpvkt66BeVp4UGgxUlhzZG8?usp=sharing

Code (additional code here: https://drive.google.com/drive/folders/0B86WKpvkt66BcDhxUW14bUsxd1k?usp=sharing )

`from jstl_inference import JSTL # jstl_inference.py is the TensorFlow version of the file jstl_dgd_deploy_inference.prototxt , made with Caffe-Tensorflow #https://github.com/ethereon/caffe-tensorflow

see code here: https://drive.google.com/drive/folders/0B86WKpvkt66BcDhxUW14bUsxd1k?usp=sharing

import tensorflow as tf from scipy.misc import imread import numpy as np from PIL import Image, ImageOps

Preparation of the feature extractor

x = tf.placeholder(tf.float32, shape=[1, 144, 56, 3]) y = tf.placeholder(tf.float32, shape=[1, 256])

net = JSTL({'data': x}) sess = tf.InteractiveSession() sess.run(tf.initialize_all_variables()) net.load('jstl_inference.npy', sess) #jstl_inference.npy is the Numpy version of jstl_dgd_inference.caffemodel , obtained with Caffe-Tensorflow https://github.com/ethereon/caffe-tensorflow

file here: https://drive.google.com/drive/folders/0B86WKpvkt66BcDhxUW14bUsxd1k?usp=sharing

person_feature = sess.graph.get_tensor_by_name("fc7/fc7:0") #gets the output from the layer FC7

def extract_vector(image_data): img = imread(image_data)

img=Image.fromarray(img)
img=ImageOps.fit(img, size=(56,144), method=Image.ANTIALIAS) # resize (by maintaining the aspect ratio) and crops the input image

img=np.asarray(img)
img = np.reshape(img, (1, 144, 56, 3))

feed= {x: img}
person_vector = sess.run(person_feature, feed_dict=feed)
return person_vector[0]

def distance_pics(photo1,photo2): person1=extract_vector(photo1) person2=extract_vector(photo2) dist = np.linalg.norm(person1-person2) print( 'distance between ' + photo1 + ' and ' + photo2 + ': '+ str(dist))

Results

distance_pics('a4_1.png','a9_1.png') distance_pics('a4_1.png','a4_34.png') distance_pics('a4_1.png','a9_28.png')

distance_pics('a4_1.png','b4_1.png') distance_pics('a4_1.png','b9_1.png') distance_pics('a4_1.png','b4_34.png') distance_pics('a4_1.png','b9_28.png')

distance_pics('b9_1.png','a4_34.png') distance_pics('b9_1.png','a9_28.png') distance_pics('b9_1.png','b4_1.png') distance_pics('b9_1.png','b9_28.png')`

Cysu commented 7 years ago

I guess there might be some mismatch between the image preprocessing methods we used.

When training the model, we use opencv to read the image, and subtract the mean pixel values. The input data to the CNN should be a 1x3x144x56 image, whose color channels are in BGR order, and are demeaned by [102, 102, 101].

Thanks for providing the script. I will verify this after the cvpr deadline.

benstaf commented 7 years ago

I revised my image pre-processing, but the result does not improve. My result is:

distance between a4_1.png and a9_1.png: 6.59645 distance between a4_1.png and a4_34.png: 7.80466 distance between a4_1.png and a9_28.png: 6.67408 distance between a4_1.png and b4_1.png: 11.086 distance between a4_1.png and b9_1.png: 10.6859 distance between a4_1.png and b4_34.png: 12.731 distance between a4_1.png and b9_28.png: 13.6327 distance between b9_1.png and a4_34.png: 9.13998 distance between b9_1.png and a9_28.png: 12.1658 distance between b9_1.png and b4_1.png: 5.44103 distance between b9_1.png and b9_28.png: 7.77282

My code is:

`from jstl_inference import JSTL # the output python script of caffe2tensorflow import tensorflow as tf

import numpy as np

import cv2

x = tf.placeholder(tf.float32, shape=[1,144, 56, 3]) y = tf.placeholder(tf.float32, shape=[1, 256])

net = JSTL({'data': x}) sess = tf.InteractiveSession() sess.run(tf.initialize_all_variables()) net.load('jstl_inference.npy', sess)

person_feature = sess.graph.get_tensor_by_name("fc7/fc7:0")

def preprocess(image): img=cv2.imread(image) shape=img.shape ratio=float(144)/float(shape[0]) dim=(int(shape[1]*ratio), 144) resized = cv2.resize(img, dim, interpolation = cv2.INTER_AREA)

Crop on both sides

margin=dim[0]-56
if margin % 2==0:
    cropped=resized[:,margin/2:dim[0]-margin/2]
else:
    cropped=resized[:,margin/2:dim[0]+1-margin/2]
cv2.imwrite('cropped_' + image, cropped)`

`# subtract the mean pixel values centered_array=cropped-np.array([102,102,101]) #demean by [102, 102, 101].

return centered_array

def extract_vector(image): centered_array=preprocess(image)

input_array = np.reshape(centered_array, (1,144, 56,3))

feed= {x: input_array}
person_vector = sess.run(person_feature, feed_dict=feed)
return person_vector[0]

def distance_pics(photo1,photo2): person1=extract_vector(photo1) person2=extract_vector(photo2) dist = np.linalg.norm(person1-person2) print( 'distance between ' + photo1 + ' and ' + photo2 + ': '+ str(dist))

Results:

distance_pics('a4_1.png','a9_1.png') distance_pics('a4_1.png','a4_34.png') distance_pics('a4_1.png','a9_28.png')

distance_pics('a4_1.png','b4_1.png') distance_pics('a4_1.png','b9_1.png') distance_pics('a4_1.png','b4_34.png') distance_pics('a4_1.png','b9_28.png')

distance_pics('b9_1.png','a4_34.png') distance_pics('b9_1.png','a9_28.png') distance_pics('b9_1.png','b4_1.png') distance_pics('b9_1.png','b9_28.png')`

kaidic commented 7 years ago

I've encountered the same problem. It seems that the feature layer outputs I got from using tensorflow and caffe are different.

soulslicer commented 6 years ago

Which prototxt in the code was used to train model for jstl_dgd_inference.caffemodel? I can't seem to find it