Attention-Aware Generalized Mean Pooling for Image Retrieval - Githubissues

chullhwan-song / Reading-Paper

151 stars 26 forks source link

Attention-Aware Generalized Mean Pooling for Image Retrieval #157

Open chullhwan-song opened 5 years ago

chullhwan-song commented 5 years ago

https://arxiv.org/abs/1811.00202

chullhwan-song commented 5 years ago

PROPOSED METHOD

Network and Pooling

GeM
Attention-Aware GeM
attention-aware GeM (AGeM) descriptor
ResNet-101의 residual block & attention unit의 조합
- 3개의 attention unit : Att1, Att2_1, Att2_2
  - Att1 : 4개의 conv layer
    - 3 × 3, 3 × 3, 1 × 1, and 1 × 1
    - 첫번째만 stride=2, 나머지 1
    - output dim은 1024, 512, 512, 2048
    - 마지막 layer를 제외한, BN & ReLU activation 적용
    - 마지막 layer는 sigmoid function
  - Att2_1 & Att2_2
    - 간단히, kernel size 1 × 1, stride 1 이루어진 하나의 conv layer로만 구성하고 input과 output의 dim이 같다. 그리고 sigmoid 적용
- attention unit과 conv feature map과의 결합
  - ⊗ : Hadamard product == element wise product
  - Fig.1 참조
- final output
  - element wise product 결합이 아닌, 마직막 layer에서는 attention unit과 conv feature map의 + 결합
- 이후, GeM > l2 normalization 적용 > 2048 dimension 크기의 최종 descriptor
- 참고로,
  - 이 feature가 attention-aware features라는 개념이 시각적으로 보여주면 어떨까 싶다??(전혀 없어서..ㅎ)
    Loss Function and Whitening
imagenet pre-trained model > finetuning: labelled landmark images를 이용한 분류학습
이후, triplet loss, contrastive loss
PCA

실험

수치가 이해되지 않는다. DIR에서의 수치와 일치되지 않는다.(ctrl+f 로 찾을수 없다.ㅠ)
- † denotes results from the original papers > DIR (맨처음)보면 ㅠ
SP: spatial verification의 약자가 아닐까생각.

결론

attention-aware features 라는 개념을 들고 나왔는데, 좀더 의미적인 설명이 필요하다고 본다.
- 논문 abstract에, "which aims at enhancing more relevant features that correspond to important keypoints in the input image." 설명하곤 있지만,,,
왜 이 feature가 더 좋은지(실험에의해서만..,) 왜 잘 working하는지 친절한 설명이 없다.
기존 attention 개념과 모가 다른지? 아님 같은지? 잘 모르겠다. > Fig.1만보고 이해하라는것같은데..좀더..ㅎ