what ?

Photo composition : 얼마나 잘찍었냐를 검사하는 논문
- aesthetic properties of good compositions
- 실제로는 하나의 이미지안에서 보다 잘 구성되어 있는 ROI를 찾는 논문이라고 볼수도 있다.
View Finding Network (VFN) 네트워크를 제안
- composed of a CNN augmented with a ranking layer, takes two views as input and predicts the more visually pleasing one in terms of composition.
  - 한이미지안에 두개의 view(ROI)를 입력으로..(학습할때~)
- CNN
- VFN learns its visual representations (i.e., optimizes the weights of the CNN) by minimizing the misorder of image pairs with known aesthetic preference.

과정

Mining Pairwise Ranking Units
- 전문가 사진은 "dangerous visual balance" 상황에서 성취한다고 볼수 있는데, 반대로, 이 balance를 깨는 다른관졈의 뷰는 aesthetics 를 떨러뜨릴수 있다.
  - 위와 같이 잘찍은 사진안에 포함된 Crop된 뷰는 별로다.ㅎ
  - 이는 negative sample로 구성하는것 같다.
- crop sampling strategies
  - We always form pairs of the original image and a crop because the aesthetic relationship between two random crops is hard to define and thus requires human validation. 두개의 랜덤 크롭을 발생
  - To enrich the example set required when choosing the best view among different views, we include crops of varying scales and aspect ratios. 다양한 스케일 종횡비 관점에서 다른 뷰사이의 best 뷰 선택
  - To best utilize the information in I, we aim to maximize the coverage of crops over I while minimizing the overlap between crops. 크롭사이에 오버랩되지 않도록..(되도록이면 작게)
    - 그냥 위의 그림처럼 두가지 형태의 crop를 발생시키는듯 border crops & square crops
      - 가장자리 기준으로 만듬 : border crops
      - 이미지의 가장큰 axis를 이용하여 만듬 : square crops
- 주의할점은 학습셋을 구성하는것이지, 실제로 prediction과정에서 roi를 찾는과정은 없다.
- View Finding Network
- 주어진 이미지(잘찍은 이미지)와 그안의 크롭 이미지와의 심미적 관계는 당연히 잘찍은 원본 이미지가 크다(rank가 높다)(수식 1 참조)
  - 이를 바탕으로 hinge loss 를 계산할수 있다.
    - 전체 구조는 metric learning과 약간 다를수 있으나 일종의 metric learning으로 보여줌(논문에서는 이런 주장을 안하는듯~) 식(2)에서, g = 1, 을 학습시키는게 목표인데, 이를위해서는, 모든 페어의 loss를 합한값인 를 최소하는것. 학습시키는 위해 ranking layer를 이용하고 수식(2)를 최소하하여 학습시킨다. 또한 prediction step 에서는 간단히 를 사용하면으로 aesthetic score를 구한다.
  - alexnet 사용
  - 이후, 두개의 ranking layer(fully-connected layers)
    - parameter-free and merely used to evaluate the hinge loss of an image pair. (수식2)
  - optional하게 last convolutional layer 다음 spp net를 이용(오랫만에 이걸 적용한 논문을 봄)
    - spp 참고
    - locally feature를 획득할수 있어서 그런가?? "global spatial relations" 더 향상시킨다고..ㅎ
    - 3x3, 5x5, 7x7 : pooling region size
    - max-pooling and average-pooling
  - 227 x 227 input size
  - fc1는 Relu, output > 1000 dim
  - fc2는 single neuron, output > final ranking score
  - imagenet pre-trained model 적용

실험결과

평가방법
- IOU
- Boundary displacement
- alpha-recall : is the fraction of best crops that have an overlapping ratio greater than alpha with the ground truth. In all of our experiments, we set alpha to 0.75 > 0.75 정도 overlapping되었다면 good
테스트 데이터
- FCDB : 링크
- ICDB : 링크
결과
- 두개의 결과가 다른데, 약간 그 데이터에 따른 특성으로 보여지며, 만약한다면, 원하는 domain(?)를 확실히 정한다음에 selecting하는것 옮은것같음.
- We generate a heatmap by evaluating sliding windows and smoothing the ranking scores corresponding to the raw pixels.

chullhwan-song / Reading-Paper

Learning to Compose with Professional Photographs on the Web #10

what ?

과정

실험결과