Local Features and Visual Words Emerge in Activations

Abstract

이미지 검색을 위한 새로운 deep spatial matching (DSM) 제안
초기 랭킹은 convolutional neural network activation(3d tensor)를 global pooling에 의해 추출된 descriptor를 기반으로 한다.
- gem같은 global descriptor로 후보 랭킹을 가져온다.
이후, local descriptor를 이용 > 위에서 제안한 DSM를 이부분을 의미하는듯.
- the same sparse 3D activation tensor is also approximated by a collection of local features.
  - sparse > relu를 적용한 conv feature map
이 전체 과정은 매우 심플하다. > 이 의미는 local descriptor 학습을 위해 학습 과정이 필요치 않다는 것이다. > 학습과정에서 나온 CNN feature map이 그 자체이다. > 이건 delf도 마찬가지.
```
This happens without any network modification, additional layers or training. 
No local feature detection happens on the original image. 
No local feature descriptors and no visual vocabulary are needed throughout the whole process.
```
abstract는 안나왔지만, attention mechanism 에 loca descriptor 에 적용하는 케이스(delf 처럼)

Deep Spatial Matching (DSM)

두개의 이미지에서 attention 영역에서의 local descriptor를 이용하는 것같은데..이건 delf에서 ???
maximally stable extremal regions (MSER) ? 을 이용 > 아 오랫만에 들어본다.
- 이미지 안에서 주변에 비해 밝거나 어두운 영역 중 문턱치 변화에 안정적인 것 들을 찾는다.
- sift에서 keypoint(interest of point)를 찾는 것 처럼, point가 아닌 blob 단위로 찾는 것인데, 위의 spatial attention 영역을 의미하는 것 같다.
- 위에서 sparse하다는 것이 이런 descriptor 가 아닌 위의 attention 영역이 sparse하게 나타남을 의미하는것으로 보인다. ( sparse feature maps are of interest)
matching > fast spatial matching (FSM)
- fast spatial matching 에서 visual word 개념 적용하는데 이때,
  - We thus treat channels as visual words, as if local features were assigned descriptors that were vector-quantized against a vocabulary and matched with the discrete metric.
  - descriptor 의 채널로 써 취급한다고 했는데, 이는 k-means가 아닌, n차원의 descriptors 의 0~(n-1) 의 value에 따른 index를 취급하는 것 같다. 자세한 내용이 없어서..내가 아는 범위에서 추측
- RANSAC
local descriptor를 이용한 matching은 Re-ranking부분이다.
실험
3 scale input
supervised whitening > only global descriptor > 이 논문에서 GeM > 실험에서는 MAC과 비교
- local descritptor는 ??
위의 조건은 finuted-gem의 연구를 그대로 가져온 것으로 보인다.
diffusion
upsampling 효과
ResNet는 VGG 보다 4배 작은 resolution (적용된 feature map > last layer가 아닐까?)
그래서 x2(dilated) 이상 크게 만든다.
DSM
diffusion 제거한 것과 비교
VGG 와 Resnet과의 비교 > VGG가 나쁘지 않다?
Supervised Whitening
내가 가장 궁금했던거였는데, reduction 취급만하 던 PCA가 아니라 큰 성능향상이 보인다.
- 이 부분을 더 자세히 공부해야겠다.
  종합 실험

chullhwan-song / Reading-Paper

Local Features and Visual Words Emerge in Activations #150

Abstract

Deep Spatial Matching (DSM)

실험

upsampling 효과

DSM

Supervised Whitening

종합 실험