KichangKim / DeepDanbooru

AI based multi-label girl image classification system, implemented by using TensorFlow.
MIT License
2.58k stars 258 forks source link

Model input and output? #92

Closed Mek101 closed 1 year ago

Mek101 commented 1 year ago

Hi, I'm a bit of a newbie on machine learning, but I was trying to port the deepdanbooru model to NCNN to be able to run it on AMD cards (via vulkan) and build a tool around it.

However I'm a bit stuck on trying to get the data in and out. I was wondering if you could please explain a bit on how the input and output matrices are organized and how are the images transformed before being sent for evaluation. Or if the model is derived by another and shares it's input-output matrix format, which is it and where could I find such documentation?

Thank you

KichangKim commented 1 year ago

You can see input/output dimensions from this code:

https://github.com/KichangKim/DeepDanbooru/blob/05eb3c39b0fae43e3caf39df801615fe79b27c2f/deepdanbooru/commands/train_project.py#L123

It is simple height, width, channel (HWC) format (TensorFlow's default image layout). Current actual input/output dimension is: Input : 512x512x3 Output : 1 x (tag count of tags.txt)

Also, you should normalize your pixel value of image from 0~255 to 0.0~1.0.

Here is my dataset wrapper for feeding images into model:

https://github.com/KichangKim/DeepDanbooru/blob/master/deepdanbooru/data/dataset_wrapper.py

Mek101 commented 1 year ago

Thank you🙏

Mek101 commented 1 year ago

Ok, a bit of an update: I managed to get the v4 model (deepdanbooru-v4-20200814-sgd-e30) to work, however it seems like it's precision it's waaay lower after the conversion.

I also tried to use the latest v3 model, however it outputs a 16x16 matrix instead of a 1x(tag count of tags.txt) for some reason(????) I was using the wrong output blob

Mek101 commented 1 year ago

The v3-20211112-sgd-e28 seems to work ok on both the cpu and vulkan, with comparable output to the original 👍