Cysu / dgd_person_reid

Domain Guided Dropout for Person Re-identification
http://arxiv.org/abs/1604.07528
231 stars 94 forks source link

Questions about data preprocessing and how to choose test set #29

Closed KaixuanZ closed 7 years ago

KaixuanZ commented 7 years ago

Thanks for your time reading my questions!

I have downloaded the caffe (not the whole dgd project) provided by you and trained cuhk03 dataset individually on my computer(prototxt file is the same as yours). However, my top1 accuracy is around 65%, which is 7% lower than the result in your paper, 72.6%. For I used your prototxt file directly, so the problem can only be on either data preprocessing or the choice of test set.

Before training stage, I computed the mean file for both training set and validation set instead of using the fixed vector [102,102,101]. I am not sure whether this will reduce the final accuracy.

Besides, for the choice of test set, I randomly choose two pictures per person(one picture from probe set and the other from gallery set) and the size of test set is 200 pictures (100 person). Then I calculate the CMC curve by Euclidean distance. Do you choose your test set the same way as me? If it's a right way to randomly choose the test set, the top1 accuracy may fluctuate, and perhaps with more experiments I will get 72.6% as my top1 accuracy. If not, could you tell me how to choose a test set?

Cysu commented 7 years ago

I think the data split and CMC evaluation would affect the performance.

In our project, we use for test all the photos of the 100 test persons. One camera view for probe, while the other for gallery. Each camera view consists about 5 instances for each person.

Also please refer to this note for how we compute the CMC evaluation metrics.

KaixuanZ commented 7 years ago

Thanks a lot for your advice! I change the CMC evaluation and now the top1 accuracy is around 77%

Cysu commented 7 years ago

Good to hear that! Will close this issue for now and please feel free to reopen it if there are any further questions.

KaixuanZ commented 7 years ago

Hi, I have a further question about how to split 3DPeS and PRID dataset.

I saw your format_3dpes.py. In your code you randomly create the training and test split. Also, half of the views are chosen as cam_0, the others as cam_1. If you randomly split the dataset, will cmc curve fluctuate due to the change of training set?

For the PRID dataset, 100 persons are randomly selected, I wonder if this procedure will influence the result.

Cysu commented 7 years ago

Yes. The random split of the dataset will affect the CMC performance. So it's necessary to run the experiments multiple times and report the averaged results.

KaixuanZ commented 7 years ago

Sorry to bother you again, but I met some difficulties.

I downloaded the pretrained jstl+dgd model and prototxt file which you provide. Features are extracted by myself using python and I compute the CMC. But the accuracy is a little lower that the number in your log file. Specially, for CUHK01 I got 58.7% as top-1 accuracy but yours is 64.6% and on VIPeR my top-1 accuracy is 1.4% lower than that in your log file.

So I want to check if the latest model the same one you used to generate your log file? If they are same, then where should I pay attention in my python code to get a higher result?

Here is how I set up the net before extracting features in python:

set net file

net_file= '/home/xy/Downloads/dgd/jstl_dgd_deploy.prototxt'

set caffe model

caffe_model = '/home/xy/Downloads/dgd/jstl_dgd.caffemodel' net = caffe.Net(net_file, caffe_model, caffe.TEST)

transformer = caffe.io.Transformer({'data': net.blobs['data'].data.shape})

set mean file

mean_file=np.array([101,102,102]) transformer.set_mean('data', mean_file)

H,W,C -> C,H,W

transformer.set_transpose('data', (2, 0, 1)) transformer.set_raw_scale('data', 255)

RGB->BGR

transformer.set_channel_swap('data', (2, 1, 0)) net.blobs['data'].reshape(1, 3, 144, 56)

Cysu commented 7 years ago

Note that we will first resize an image to 160x64 and then crop the center region of size 144x56. Also we use the c++ interface to extract features, which uses opencv to read images. The python interface uses scikit-image to read images, which might be slightly different.

KaixuanZ commented 7 years ago

Hi, I plan to report your results in my paper, but the results in your log files are slightly different from that in your paper (I heard it's for a bug in caffe was fixed?). Which one should I report? Or either one of them is OK?

Cysu commented 7 years ago

It's the random data split that matters the most. In our paper, we ran the experiments multiple times (>= 4) and reported the averaged results. It's recommended to evaluate your approach multiple times under different data splits. You can use the paper result for comparisons in such case.

KaixuanZ commented 7 years ago

Thanks for your time and reply :) But I have some more questions about your CNN structure.

In your paper the output size of inception(4a) is 256 72 28. Since the stride of inception(4b) is 2, why the output size of this layer is 384 72 28 rather than 384 36 14? Besides, if stride of an inception layer is 2, does it mean that in each branch there is a convolution layer with a stride of 2?

One last question is about how to compute the channel. In googlenet, the channel of an inception layer equals to the summation of channels of #1 1, #3 3, #5 5 and pooling+proj. But in your CNN structure this rule does not work (I know a double #3 3 approximate a #5 * 5). Is there some details I dismiss?

Cysu commented 7 years ago

Sorry for the confusing. There were typos in the paper.

  1. The output size of inception (4b) should be 384 x 36 x 14, inception (5b) should be 768 x 18 x 7, inception (6a) should be 1024 x 18 x 7, inception (6b) should be 1536 x 9 x 4.
  2. The number of filters in the inception 4,5,6 blocks should be 64, 128, 256, respectively.