Residual Attention Networks for Image Classification - Githubissues

chullhwan-song / Reading-Paper

151 stars 26 forks source link

Residual Attention Networks for Image Classification #35

Open chullhwan-song opened 6 years ago

chullhwan-song commented 6 years ago

https://arxiv.org/abs/1704.06904 http://cs231n.stanford.edu/reports/2017/pdfs/939.pdf

chullhwan-song commented 6 years ago

What?

CNN 계열 attention 초기 유명 논문 > spatial attention 부분에서
image classification task
모듈처럼 기존 cnn net에 끼워넣음. > resnet

Residual Attention Network

Residual Unit, ResNeXt, Inception 에 쉽게 모듈하여 결합할수 있다.
- 그럼 내가 필요한 CNN 모듈에서는 ㅠ
- 이렇게 주장하지만, 실제로는 그렇지 않다라고 생각됨.(해보니 어려움)
크게 두가지 부분 > 이두부분이 joint하여 구성 -> highway network의 방식 모사.
- mask branch()= soft mask branch
  - bottom-up and Top-Down 구조
  - residual unit과 max pooling의 조합
  - 일종의 de-convolution를 이용한 segmentation과 구조와 유사
    - down sampling vs up sampling 구조
    - up sampling 할때, Linear interpolation
      - Linear interpolation의 수는 max pooling동일하다.
    - 이후, 1x1 conv 를 두번한다. 왜 하지?? > 차원을 줄이려는듯~
    - 최종적으로, sigmoid layer -normalize [0, 1]
- trunk branch()
  - bottom-up 과 top-down (M(x))와 T(x)는 skip-connection

i ranges over all spatial positions
c is channel index
- 수식 (1)의 back-propa 가능
  - 는 branch parameter, 는 trunk parameter
- Attention Residual Learning = resnet과의 결합.
- F는 original feature
- 이 두구조를 가지는 속성은 robust to noisy labels.
- Spatial Attention and Channel Attention
- soft mask output 전에 activation function 안에서 normalization step 형태로써 변환를 통해 mask branch와 결합가능 > 그니까 여기서는 이 activation function를 애기하려는듯~
  - ?? > 이부분이 저런
- three types of activation functions > constrains to attention can still be added to mask branch by changing normalization step in activation function before soft mask output.
- Mixed attention f1 : Mixed attention f1 without additional restriction use simple sigmoid for each channel and spatial position
- Channel attention f2 performs L2 normalization within all channels for each spatial position to remove spatial information.
- Spatial attention f3 performs normalization within feature map from each channel
- i ranges over all spatial positions
- c ranges over all channels.
- mean_c and std_c denotes the mean value and standard deviation of feature map from c-th channel
- x_i denotes the feature vector at the ith spatial position.
- 전체 network 구조

실험