Open benstaf opened 7 years ago
Our model does not directly produces the binary verification result of a pair of people. During test stage, we first go through all the images, and extract their features using our net. Then we compute the pairwise Euclidean distances between query and gallery people. At last, for each query, we just rank the gallery samples based on their distances.
If you just wish to do the verification, you can choose a distance threshold that balances the true positive rate and false positive rate.
I tried to follow your suggestions, but my result is not convincing. I made some experiments with the PRID dataset.
In the multi shot case, I choose 2 pictures of persons 4 and 9, taken with cameras A and B (8 pictures in total).
We should get a large distance between pictures of different persons, and a small distance between pictures of the same person, but this is not the case. Why?
Some results are here (for example, a4_1.png is picture number 1 of person 4 by camera A):
distance between a4_1.png and a9_1.png: 6.65493 distance between a4_1.png and a4_34.png: 6.5565 distance between a4_1.png and a9_28.png: 4.84618 distance between a4_1.png and b4_1.png: 7.06474 distance between a4_1.png and b9_1.png: 8.09637 distance between a4_1.png and b4_34.png: 5.71222 distance between a4_1.png and b9_28.png: 5.91796 distance between b9_1.png and a4_34.png: 9.21853 distance between b9_1.png and a9_28.png: 7.02944 distance between b9_1.png and b4_1.png: 4.23969 distance between b9_1.png and b9_28.png: 5.4921
Cropped images are here (cropped to shape (56,144) for input in the neural network): https://drive.google.com/drive/folders/0B86WKpvkt66BeVp4UGgxUlhzZG8?usp=sharing
Code (additional code here: https://drive.google.com/drive/folders/0B86WKpvkt66BcDhxUW14bUsxd1k?usp=sharing )
`from jstl_inference import JSTL # jstl_inference.py is the TensorFlow version of the file jstl_dgd_deploy_inference.prototxt , made with Caffe-Tensorflow #https://github.com/ethereon/caffe-tensorflow
import tensorflow as tf from scipy.misc import imread import numpy as np from PIL import Image, ImageOps
x = tf.placeholder(tf.float32, shape=[1, 144, 56, 3]) y = tf.placeholder(tf.float32, shape=[1, 256])
net = JSTL({'data': x}) sess = tf.InteractiveSession() sess.run(tf.initialize_all_variables()) net.load('jstl_inference.npy', sess) #jstl_inference.npy is the Numpy version of jstl_dgd_inference.caffemodel , obtained with Caffe-Tensorflow https://github.com/ethereon/caffe-tensorflow
person_feature = sess.graph.get_tensor_by_name("fc7/fc7:0") #gets the output from the layer FC7
def extract_vector(image_data): img = imread(image_data)
img=Image.fromarray(img)
img=ImageOps.fit(img, size=(56,144), method=Image.ANTIALIAS) # resize (by maintaining the aspect ratio) and crops the input image
img=np.asarray(img)
img = np.reshape(img, (1, 144, 56, 3))
feed= {x: img}
person_vector = sess.run(person_feature, feed_dict=feed)
return person_vector[0]
def distance_pics(photo1,photo2): person1=extract_vector(photo1) person2=extract_vector(photo2) dist = np.linalg.norm(person1-person2) print( 'distance between ' + photo1 + ' and ' + photo2 + ': '+ str(dist))
distance_pics('a4_1.png','a9_1.png') distance_pics('a4_1.png','a4_34.png') distance_pics('a4_1.png','a9_28.png')
distance_pics('a4_1.png','b4_1.png') distance_pics('a4_1.png','b9_1.png') distance_pics('a4_1.png','b4_34.png') distance_pics('a4_1.png','b9_28.png')
distance_pics('b9_1.png','a4_34.png') distance_pics('b9_1.png','a9_28.png') distance_pics('b9_1.png','b4_1.png') distance_pics('b9_1.png','b9_28.png')`
I guess there might be some mismatch between the image preprocessing methods we used.
When training the model, we use opencv to read the image, and subtract the mean pixel values. The input data to the CNN should be a 1x3x144x56 image, whose color channels are in BGR order, and are demeaned by [102, 102, 101].
Thanks for providing the script. I will verify this after the cvpr deadline.
I revised my image pre-processing, but the result does not improve. My result is:
distance between a4_1.png and a9_1.png: 6.59645 distance between a4_1.png and a4_34.png: 7.80466 distance between a4_1.png and a9_28.png: 6.67408 distance between a4_1.png and b4_1.png: 11.086 distance between a4_1.png and b9_1.png: 10.6859 distance between a4_1.png and b4_34.png: 12.731 distance between a4_1.png and b9_28.png: 13.6327 distance between b9_1.png and a4_34.png: 9.13998 distance between b9_1.png and a9_28.png: 12.1658 distance between b9_1.png and b4_1.png: 5.44103 distance between b9_1.png and b9_28.png: 7.77282
My code is:
`from jstl_inference import JSTL # the output python script of caffe2tensorflow import tensorflow as tf
import numpy as np
import cv2
x = tf.placeholder(tf.float32, shape=[1,144, 56, 3]) y = tf.placeholder(tf.float32, shape=[1, 256])
net = JSTL({'data': x}) sess = tf.InteractiveSession() sess.run(tf.initialize_all_variables()) net.load('jstl_inference.npy', sess)
person_feature = sess.graph.get_tensor_by_name("fc7/fc7:0")
def preprocess(image): img=cv2.imread(image) shape=img.shape ratio=float(144)/float(shape[0]) dim=(int(shape[1]*ratio), 144) resized = cv2.resize(img, dim, interpolation = cv2.INTER_AREA)
margin=dim[0]-56
if margin % 2==0:
cropped=resized[:,margin/2:dim[0]-margin/2]
else:
cropped=resized[:,margin/2:dim[0]+1-margin/2]
cv2.imwrite('cropped_' + image, cropped)`
`# subtract the mean pixel values centered_array=cropped-np.array([102,102,101]) #demean by [102, 102, 101].
return centered_array
def extract_vector(image): centered_array=preprocess(image)
input_array = np.reshape(centered_array, (1,144, 56,3))
feed= {x: input_array}
person_vector = sess.run(person_feature, feed_dict=feed)
return person_vector[0]
def distance_pics(photo1,photo2): person1=extract_vector(photo1) person2=extract_vector(photo2) dist = np.linalg.norm(person1-person2) print( 'distance between ' + photo1 + ' and ' + photo2 + ': '+ str(dist))
distance_pics('a4_1.png','a9_1.png') distance_pics('a4_1.png','a4_34.png') distance_pics('a4_1.png','a9_28.png')
distance_pics('a4_1.png','b4_1.png') distance_pics('a4_1.png','b9_1.png') distance_pics('a4_1.png','b4_34.png') distance_pics('a4_1.png','b9_28.png')
distance_pics('b9_1.png','a4_34.png') distance_pics('b9_1.png','a9_28.png') distance_pics('b9_1.png','b4_1.png') distance_pics('b9_1.png','b9_28.png')`
I've encountered the same problem. It seems that the feature layer outputs I got from using tensorflow and caffe are different.
Which prototxt in the code was used to train model for jstl_dgd_inference.caffemodel? I can't seem to find it
I don't understand how to do person re-identification with the pretrained JSTL+DGD model found here: https://drive.google.com/open?id=0B67_d0rLRTQYZnB5ZUZpdTlxM0k
I have two problems, one related to input, one related to output :
But here, in the file 'jstl_dgd_deploy_inference.prototxt', the input data is (1,3,144,56) and not, for example, (2,3,144,56).
Moreover, when loading the caffemodel weights, I receive the warnings:
I1108 08:48:31.324759 1525 net.cpp:752] Ignoring source layer relu7 I1108 08:48:31.324795 1525 net.cpp:752] Ignoring source layer drop7 I1108 08:48:31.324802 1525 net.cpp:752] Ignoring source layer fc8_jstl
This suggests that something is missing in the prototxt file.