KichangKim / DeepDanbooru

AI based multi-label girl image classification system, implemented by using TensorFlow.
MIT License
2.58k stars 258 forks source link

How to properly train it? #84

Open meanjoep92 opened 1 year ago

meanjoep92 commented 1 year ago

Hey folks, so this looks absolutely promising for helping me automate a massive collection. I plan to use it just for optional tags from my database only. However, I'm a complete novice with only a limited amount of programming knowledge.

1. Am I doing the image structure correct?

I'm confused on the example listed on the wiki. In the example, I wanted to have it identify whether or not someone has a shirt or hat. Would this be the accurate way on how to do it?

HasClothesDataSet/
├── images/
│   ├── hat/
│   │   ├── 00000000000000000000000000000000.jpg
│   │   ├── ...
│   ├── nohat/
│   │   ├── ...
│   └── shirt/
│       ├── ...
└── my-dataset.sqlite

2. How do I set it to label whether or not it has something?

This is probably the biggest area of confusion for me. I know if I want it to match that image, I would just have the MD5 and tag_string set to the intended tag. How exactly do I train it to not match it to that tag? (For example, correctly identify image doesn't contain a shirt at all)

I'm assuming that you can simply add more than one tag to it (For example: RedShirt BlueShirt) as long as you specify the tag_count_general, right?

Would every single image need to have it's MD5 go into that my-dataset.sqlite?

id | md5 | file_ext | tag_string | tag_count_general -- | -- | -- | -- | -- 1 | f4c902c9ae5a2a9d8f84868ad064e706 | jpg | Shirt | 1 2 | f4c902c9ae5a2a9d8f84868ad064e706 | jpg | | 0 3 | 28406bef86a21228683f140f3317d194 | jpg | NoHat  | 1 4 | 28406bef86a21228683f140f3317d194 | jpg | RedShirt GreenShirt | 2

I'm terribly sorry if this is worded weirdly. I'm just confused. (If someone has an example project folder, I might be able to understand it more)

Thank you

KichangKim commented 1 year ago

Hi.

  1. I updated README.md for more clean explanation. Sub-folders in images folder is just first 2 characters of its filename.

  2. Every images need to have its unique filename (MD5 is good default choice, but it does not to be actual MD5, just unique name is okay).

Also, I think NoHat tag is unneeded. Just make it has empty or shirt-related tags only. So you can estimate whether the image has hat by checking whether the score of hat tag is larger than 0.5.

Default deepdanbooru training settings skips images which has tag_count_general < 20, so you should change minimum_tag_count in project.json to 0.

Asmedeus998 commented 1 month ago

hello how do you saved the evaluation to sqlite?

I already follow the README and run the following command: deepdanbooru evaluate [image_file_path or folder]... --project-path [your_project_folder] --allow-folder

but the sqlite row still 0

I already try to change minimum_tag_count to 0

change the database_path ~/DeepDanbooru/database/danbooru.sqlite

here is my database structure

dataset
├── danbooru.sqlite
└── images
    ├── 0
    │   ├── danbooru_7235546_9fb231f273e0769e54013014d685f629.jpg
    └── 1
        ├── danbooru_7235546_9fb231f273e0769e54013014d685f629.jpg
KichangKim commented 1 month ago

@Asmedeus998 Database is only need for training. If you want to evaluate, you need models which is generated by train command. (or you can download pre-trained model from release section on this repository.)