Composition-preserving Deep Photo Aesthetics Assessment - Githubissues

chullhwan-song / Reading-Paper

151 stars 26 forks source link

Composition-preserving Deep Photo Aesthetics Assessment #66

Open chullhwan-song opened 5 years ago

chullhwan-song commented 5 years ago

https://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Mai_Composition-Preserving_Deep_Photo_CVPR_2016_paper.pdf

chullhwan-song commented 5 years ago

Abstract

최근 연구는 CNN기반으로..어느정도의 성과.
그러나, 고정된 input size(fixed-size input)로 인해 문제가 발생.
- fully connected layer의 사용함으로, 고정된 input size를 받는다..> SPPNet이 이런 문제를 해결하려고..
입력이미지 damages > Image Composition에 대해 문제를 발생
- transformed via cropping, scaling, or padding
- reduces image resolution, or causes image distortion
이 연구에서는 이미지의 Composition를 보존하는 방법 제시 - composition-preserving deep ConvNet method
- directly learns aesthetics features from the original input images without any image transformations.
- 이 연구에서는 오리지널 사이즈를 그대로 받음. 이를 위해 "adaptive spatial pooling layer" adding
결론적으로 "Multi-Net Adaptive Spatial Pooling ConvNet architecture" 란걸 제안.

Composition-preserving Deep Network for Photo Aesthetics Assessment

역쉬 SPPNet에서 영감을 얻음.
- 하지만 이와는 달리 " multiple fixed-size input" 를 받는 형태
- multiple sub-networks for different pooling sizes > ??
  Composition-preserving Deep ConvNet
기본적으로 언급했던, SPPNet의 adaptive spatial pooling 전략을 취함. > 이는 오리지널 입력이미지의 크기 그대로 받기 위함.
Multi-Net Adaptive-Pooling ConvNet
SPPNet의 기본 개념에서 multi-scale feature 추출하는게 목표.
- SPPNet넷의 위의 그림처럼 multi-sclae하지 않나??? 다양한 Grid개념...
이 연구에서는 SPPNet자체가 하나의 conv layer(맨 마지막 conv layer)를 입력으로 받는다는게 문제라고 봄.
그래서, 다양한 사이즈의 pooling layer들을 결합함 이를 "Multi-Net Adaptive-Pooling method"
- 맨마지막이 아니라 하위의 여러 layer에서 받는 case인듯..
최종적으로 Multi-Net Adaptive-Pooling ConvNet (MNA-CNN) 제시
- basenet에서의 sub-networks들을 받음(바로 위에서 언급한것이 맞는듯..) > SSD를 생각하면 쉬움.
- 관련 그림이 그닥 와닿지는 않음.

Scene-Aware Multi-Net Aggregation

최종 결과는 multiple sub-network에서의 각 결과들을 average
실제로는 위의 평균을 내는 루틴을 다음과 같은 것으로 대체
- sub-network predictions 과 image scene categorization posteriors를 concat하는 형태로
  - a new aggregation layer that takes the concatenation of the sub-network predictions and the image scene categorization posteriors as input and output the final aesthetics prediction.
다음 그림을 보면 명확

Experiments