Photo Aesthetics Ranking Network with Attributes and Content Adaptation

소개

이미지에 대한 Aesthetics 관점에서의 Ranking
단순히 말하면 Image Quality에 대한 내용
we propose to learn a deep convolutional neural network to rank photo aesthetics in which the relative ranking of photo aesthetics are directly modeled in the loss function.
- 이미지의 Aesthetics ranking을 위해 CNN을 이용하고 관련된 loss등을 적용하여 이를 성취.
Aesthetics with Attributes Database(AABB) 데이터셋 <- release
- 다수의 익명의 사람이 aesthetic score와 의미있는 Attribute를 annotator한 데이터셋.
이를 이용하여.. Attribute관점에서 Aesthetics score를 를 효과적으로 .. 다음 그림을 보면,,
제안한 CNN 모델을 통해 SOTA 달성.
새로운 sampling 전략 for 학습

11개의 attributes
- interesting content, object emphasis, good lighting, color harmony, vivid color, shallow depth of f ield, motion blur, rule of thirds, balancing element, repetition, and symmetry.
total 10000개 이미지
- we randomly split the dataset into validation (500), testing (1,000) and training sets (the rest).
Aggregating multiple raters 허용, 이를 "average ratings are well fit by a Gaussian distribution."
- AVA 데이터셋은 아니다..
  Fusing Attributes and Content for Aesthetics Ranking
fine-tuning AlexNet
fine-tune a Siamese network
- image pairs as input and is trained with a joint Euclidean and ranking loss
  Regression Network for Aesthetics Rating
fine-tuned from AlexNet
softmax loss가 아닌 Euclidean loss 로 대체
- y_i는 라벨된 데이터(GT)로, " the average ground-truth rating for image_i"
- j^_i는 당연히 예측결과를 의미
- [0,1] 사이의 값으로 scaling
  Pairwise Training and Sampling Strategies
Euclidean loss에 대한 보완할 필요가 존재. > 비슷한 평균 Aesthetics 값을 가진 이미지들에서는 문제 발생소지가 있음.
이를 위해, Siamese network
- pairwise ranking loss to explicitly exploit relative rankings of image pairs available in the AADB data
ranking loss > 알파는 margin과 관련된 param
jointly loss - Fig.3 a)

Attribute속성과 관련하여 aesthetic 에 반영하려는 개념.
Fig.3 b)
Attribute에 대한 분류 layer를 따로 두고 이 값과, Aesthetics 값과 concat
- The attribute predictions from this layer are concatenated with the base model to predict the final aesthetic score. > 근데 그림상에서는 두 value간의 sigmoid activation 통과 한후, concat
  - 0~1사이로 rescale성??

Content 에 대한 결합. 요 개념은 밑의 그림보면
Fig3. c)
그냥 카테고리 분류같다.(실제적으로 AABB 데이터를 봐야할듯한데,, 전경이나 사람나오는 이미지등으로 분류가 되지 않았을까?)
We fine-tune the top two layers of AlexNet with softmax loss to train a content specific branch to predict category labels
결론적으로 이미지가 좋냐나쁘냐와 각 attributes(선명함/조명등등) 마지막으로 각 content(카테고리)에 대한 조합하여 최종적인 aesthetic score를 계산하고픈것 같다.