VisualComputingInstitute / triplet-reid

Code for reproducing the results of our "In Defense of the Triplet Loss for Person Re-Identification" paper.
https://arxiv.org/abs/1703.07737
MIT License
764 stars 216 forks source link

two questions L2norm and zero losses #7

Open bnu-wangxun opened 7 years ago

bnu-wangxun commented 7 years ago

I have replicated your papar : << Defense of Triplet Loss >> , on the Market1501 dataset by caffe. It has good performance just as your paper said.

As I can see, the Batch hard loss without softplus function, will be 0 mostly when at the last iterations. So I want ask do you have tried any other type of hard mining (In your discussion section : Notes on network training )? If you have done, I want to hear more detail about your experiments performance .

Secondly, I also tried to add a L2 norm layer for the embedding , the training is not stable and the result is very poor. I read your explanation about that, but I think it can not explain such phenomenon. Because as I know, some other types of metric learning losses have good performance with L2 norm such as << DarkRank: Accelerating Deep Metric Learning via Cross Sample Similarities Transfer >> . I want to ask whether you have any deeper thinking on such phenomenon.

lucasb-eyer commented 7 years ago

Hi, thanks for the interesting questions, and also thanks for letting us know that you managed to reproduce the results, that's always good news!

The loss becoming 0

Yes, you can see in Figure 6 of the supplementary material that we have the same experience with the hard margin: training loss is often zero, and number of active triplets in the batch is also often zero. The network is "done".

We did not have the time to explicitly search for making training even harder later on in the training, but it is a good idea to investigate and could potentially improve score even more. I can think of two easy ways to do this: a) change the batch to make having really hard samples more likely, by either increasing batch size if memory allows it, or maybe keeping batch size constant but reducing P and increasing K, or the other way around. b) adding more or stronger augmentation later in the training. IIRC, we did only flip and crop, but you could add squeeze, rotate, color/gamma-noise, etc.

Normalizing the embedding

Indeed, we also had better performance without that. We had some early experiments where adding normalization worked well, but overall we do not have good knowledge about when exactly it works and when not. Intuitively, normalizing makes more sense when using squared Euclidean distance (in this case, it equals to cosine distance between unnormalized vectors) whereas not normalizing makes more sense when using raw Euclidean distance (since the "units"/space of distance and vectors stay the same). But I performed some large-scale experiments after the paper and found no combination of norm/not-norm and squared/not-squared Euclidean that consistently worked or was consistently best.

Cospel commented 6 years ago

Hello, thank you for your discussion. I have questions about margin and embeddings. If you did not normalize the embeddings how do you determine margin? I tried to (in my vgg-net) not normalize the embeddings however the distances are too big and obviously I could not select hard margin like 0.2.

lucasb-eyer commented 6 years ago

As you can see in the paper, we used a custom train/val split of the MARS training-set to determine the margin, although in the end we ended up using the soft-margin which doesn't have a parameter.

If you're using a reasonable pre-trained backend and initialize the (new) final embedding layer reasonably, the distances should not be too large at the start. In fact, the distances should approach sqrt(2D) where D is embedding size, IIRC. It is a little difficult to see, but in the appendix of our paper, you can see that the distances indeed start around 15 ≊sqrt(2*128), but quickly drop at the start of training.

Really, without l2-norm, the margin should only roughly dictate the scale of the learned space.

HuaZheLei commented 6 years ago

@bnulihaixia Hello, could you share your caffe code with me? I am a beginner on deep learning and I have no idea about how to implement it. Thanks a lot.

bnu-wangxun commented 6 years ago

@HuaZheLei Sorry, this work is done during my internship. The code is not in my hand. If you want to implement it in caffe, you can refer the lifted-structure [https://github.com/rksltnl/Deep-Metric-Learning-CVPR16] My caffe implement is modified from this repository.

HuaZheLei commented 6 years ago

@bnulihaixia Thanks for help. I will have a look.

soulslicer commented 6 years ago

How did you to do the hard triplet mining? How do you get access to the net weights while training