KovenYu / MAR

Pytorch code for our CVPR'19 (oral) work: Unsupervised person re-identification by soft multilabel learning
https://kovenyu.com/publication/2019-cvpr-mar/
315 stars 83 forks source link

About the results. #7

Closed huanhuancao closed 5 years ago

huanhuancao commented 5 years ago

Hello, I noticed the differences between the results of Market-1501 and DukeMTMC-reID in your ablation study about those three losses. Why the rank-1 without Lcml and Lral is higher than just without Lral? Can you explain this? thank you!

KovenYu commented 5 years ago

Yeah. That might seem weird in the first glance. But this strangeness comes from an imprecise assumption that every loss component is linearly superpositioned and provide linearly stackable performance promotions. This assumption might be true for multiple orthogonal improvements over some baseline, e.g., increasing the image resolution and performing data augmentation are not intrinsically related, so that they could provide independent (and thus stackable) improvements for a recognition system.

Yet this is not the case for $L{RAL}$ and $L{CML}$ since they are related, not independent, because they interact with each other (recall that they are both computed in the same feature space) for learning a better soft multilabel. One plausiable explanation concerning the domain gap comes as follows. Consider two extreme cases.

  1. If the source domain is similar enough (say, it is exactly a partition from the target dataset) to the target domain, the comparative soft multilabel could be valid even without $L{RAL}$. In this case, adding $L{CML}$ should be beneficial.

  2. But if great domain shift exists (say, the source domain is ImageNet where nearly every class is not person), comparing a target person to the reference "persons" does not make sense; the soft multilabel is not a meaningful representation that we assume it to be. Thus, aligning the distributions of invalid soft multilabel does not give advantages, and the gradient can even be random and harmful (recall that the gradient will also be backproped to the whole network!).

So the performance in the ablation study variants are somehow unpredictable (at least not naively predictable), as we do not really have a perfect measure for the domain similarity. Nevertheless, the full model should work in general as it deals with both cases.

huanhuancao commented 5 years ago

Thank you very much for your detailed answer. I understand what you mean.