293,008 high definition (1360 x 1360 pixels) fashion images > 데이터셋 소개
이들 이미지들은
" item descriptions provided by professional stylists"과 같이 pair정보로 제공.
Each item is photographed from a variety of angles.
이 연구는 2가지 baseline results : ProGAN, StackGAN
high-resolution image generation
주어진 text description의 조건에 의해 생성된 fashion 이미지
이 연구의 기본은 Fashion 관련 좋은 데이터셋을 공개하고 이게 얼마나 좋은지, 어떤 포멧/정보를 담고 있는지 설명하고, 위에서 말한 기존 두가지 알고리즘을 적용했더니 공개한 데이터셋이 잘 working하면서, 이 셋에 대한 baseline을 제시, 그래서 이를 이용하여 연구자들이 참여(이용)하는 좋은 연구에 기여.
contribution
데이터셋에 대한 세부 통계정보.
존재하는 데이터셋에 대한 비교
text to image generation에 대한 소개 > competition criteria & evaluation process
ProGAN과 의한 high-resolution image generation 결과
StackGAN-v1/ StackGAN-v2에 의한 text-to-image translation 결과
Our Fashion Dataset
The dataset consists of 293, 008 images (260, 480 images for training, 32, 528 for validation, 32, 528 for test), which is larger than other available datasets for the task of text to image translation.
• We provide full HD images photographed under consistent studio conditions. There are no other datasets with comparable resolution and consistent photographing condition.
All fashion items are photographed from 1 to 6 different angles depending on the category of the item. To our knowledge, this is the first dataset of this scale consisting of multiple angles of each item.
Each product belongs to a main category and a more fine-grained category (i.e: subcategory). There are 48 main categories, and 121 fine-grained categories in the dataset. The name and density of each category is plotted in 2. Table 3 presents the number of images by category and subcategory.
Each fashion item is paired with paragraph-length descriptive captions sourced from experts (professional designers). The distribution of the length of descriptions is presented in Figure 4.
For each item, we also provide metadata such as stylist recommended matched items, the fashion season, designer and the brand. We also provide the distribution of colors extracted from the text description presented in Figure 3
We provide a framework that enables researchers to easily compare the performance of
their models with an evaluation metric based on an Inception Score (Salimans et al., 2016).
was trained on the training set for classifying the images into the categories presented in Figure 2.
최종 challenge 평가를 위해, 학습된 모델로 부터 test set에 대한 score가 제공된다고 하니..이를 이용하여 평가하는듯.
이런것도 문제가 조금 있으니, provide a human evaluation as we outline below
https://arxiv.org/abs/1806.08317