VisualComputingInstitute / triplet-reid

Code for reproducing the results of our "In Defense of the Triplet Loss for Person Re-Identification" paper.
https://arxiv.org/abs/1703.07737
MIT License
765 stars 216 forks source link

Performance on CUHK03 #14

Open Edelbert opened 6 years ago

Edelbert commented 6 years ago

Hello, authors. I was wondering, if you could provide some extra details about training on CUHK03. There is third-party re-implementation of your work. This implementation shows almost the same performance on Market1501, according to their benchmarks they did not use Test-time data augmentation. However, your performance on chuk03 is a little bit far away from theirs. Why? Can Test-data augmentation influence final result that much? By the way, did you use only one GPU fro training?

lucasb-eyer commented 6 years ago
  1. Since they compute a mAP score on CUHK03, I'm assuming they use the "new" evaluation strategy introduced by Liang Zheng. We used the "old" one described in the original CUHK03 paper, because that's what most papers use. You cannot meaningfully compare scores across the two.
  2. Yes, augmentation can make a big difference, especially on smaller datasets.
  3. Only one GPU.
Pandoro commented 6 years ago

Regarding the scores from the third-party re-implementation, I quickly skimmed their code and they do actually load the original splits and have the option to train with each of the splits. However, it is a little unclear if the scores in their benchmark are obtained from actually doing the 20 trainings and reporting the average. If the results come from a single split, this does explain why the scores are different since the performance on the different splits varies quite a bit.

It's important not to look at the CMC allshot evaluation. This is not the typical CUHK03 evaluation protocol and thus not comparable to numbers you find in the literature. When comparing their CUHK03 results (85.4) with ours (89.6/87.6) I think that slightly different implementations and test-time augmentation can explain the difference.

For our CUHK03 experiments we combined the training and validation sets and used all hyperparameters as also used for our Market-1501 and MARS training (hence we don't need the validation set). The only thing we changed was the input size where we used 256x96 instead of 256x128 to better match the original CUHK03 aspect ratio.

nihao88 commented 6 years ago

Dear authors, as I see from discussion, for CUHK03 you use same training procedure with almost same parameters as for Market-1501... Am I correct? The thing is when I use code as is for CUHK-03 training I got 30% cmc rank-1. For testing I use 100 persons from 1 and 100 persons for second camera. Could I ask you to reveal CUHK-03 testing procedure that you use? Thank you.

lucasb-eyer commented 6 years ago

Almost same parameters, yes. The main difference is "H = 256, W = 96". As for the testing procedure, we follow the "original" 20-split one, which is detailed in the original CUHK03 paper.

We have never gotten anything nearly as low as 30% rank-1, that's a very bad score indicating you're doing something very wrong or have a bug hidden somewhere. The most frequent mistake we see is that people forget to load the pre-trained ImageNet weights.

nihao88 commented 6 years ago

Thank you for quick respone. I will double check and come back with details

liangbh6 commented 6 years ago

Actually I have an interest in mAP score of TriNet on cuhk03.

lucasb-eyer commented 6 years ago

I don't fully understand what you mean @liangbh6. In case you didn't notice, we have included CUHK03 scores in the latest arxiv version of the paper.

liangbh6 commented 6 years ago

I have found the rank-1 and rank-5 scores on cuhk03 in the latest arxiv version of the paper. But mAP is a measure different from them.

Pandoro commented 6 years ago

@liangbh6 aaah! In fact both @lucasb-eyer and me were a bit confused by your comment since we do provide CUHK03 results, but only now I realize we do not provide the mAP score. This is simply based on the fact that the mAP score is not meaningful on the CUHK03 dataset since you can only retrieve a single ground truth match.

Some of the more recent papers stopped using the original evaluation protocol and rather created a single new train-test-split for which mAP seems to make more sense. It should be noted though that these scores are not compatible and you should always pay attention to the evaluation protocol when looking at CUHK03 scores in a paper. To be honest, even within the original evaluation protocol, there are some ambiguities and a lot of the papers seem to evaluate in a slightly different way. I have always wondered how comparable the scores are at all. The new split might actually fix this to some extend.

liangbh6 commented 6 years ago

Well, thanks for your explanation!