RF5 / danbooru-pretrained

Pretrained pytorch models for the Danbooru2018 dataset
Other
173 stars 16 forks source link

How are the tags encoded for the training? #2

Closed segalinc closed 4 years ago

segalinc commented 4 years ago

Hello, I am trying to replicate the training using pytorch and preparing the data is not clear how you format the labels tags. Do you hot encode them so you have an array of multiple ones based on the tags present in the target image ?

RF5 commented 4 years ago

Hi

Super sorry for the late reply. The data preparation is done in the training_notebooks/data_preparation.ipynb notebook, which looks through the raw danbooru2018 dataset's file structure with metadata to build a final csv file which is used for training. Concretely, it takes the metadata tags for a particular image, filters it to those tags within the top 6000 most seen tags over the full dataset, and then adds a tag for the age_rating and score attribute in the metadata for that image.

This list of tags is then saved with the file path of the image, in the format of [image path], [list of space-separated tags]. For example:

danbooru2018/original/0167/263167.jpg,age_rating_s 1girl solo long_hair brown_hair ribbon bangs meta_score_0 yellow_eyes japanese_clothes barefoot artist_request blunt_bangs hair_bun hime_cut eyes ankle_ribbon'

This process is repeated for each image in the Danbooru dataset, and each line as generated above is saved into a final tag_labels_6000.csv file. This file is then directly fed into fastai. Internally fastai I believe does hot encode them (with a 6000 length zero vector with ones where that tag is present for that image), however the library's implementation changes quite often so it is best to consult the docs.

So in short: no, they are not hot encoded (although fastai does eventually hot encode them internally for using it in the loss function I believe). Feel free to compare with the tag_labels_6000.csv file generated from my data preparation to double-check that they are the same.