KichangKim / DeepDanbooru

AI based multi-label girl image classification system, implemented by using TensorFlow.
MIT License
2.58k stars 258 forks source link

Help deploying locally #99

Closed MarisaCodes closed 6 months ago

MarisaCodes commented 1 year ago

Hi. I stumbled upon this project an year ago but I only knew very basic programming back then so I was never able to touch this. Later on I learned web development both javascript and python. I did a bit of AI but nothing advanced, just MNIST type exercises and common basic stuff. I git cloned this repo several times, created a venv and installed the required packages using pip but I just don't know how to go from that to using the $ deepdanbooru evaluate cli tool. I also don't really want to use the cli only, I also want to execute this model from another .py file but I really don't know how to get started. Can anyone experienced with this please help?

I know the setup steps should be simple but I am missing something for sure. I tried to execute setup.py build but I just got a build folder with code exactly the same as what I cloned.

Please help with the local setup I will be immensely grateful!!!

PS: I am running Ubuntu Linux WSL1 with Windows 10

fenfenyangyangmate commented 11 months ago

I saw a blog based on a modified version before, but I'm not sure if it's what you want. It may require you to use translation software and an old version of Deepdooru

https://www.sitstars.com/archives/75/

MarisaCodes commented 6 months ago

I was out of my depth but it's very simple:

import keras
from keras import losses

# constants for ddv3
ddv3path = "./dd/model-resnet_custom_v3.h5" # downloaded from releases
ddv3tagspath = "./dd/tags.txt"
ddv3chartagspath = "./dd/tags-character.txt"

# constants for ddv4
ddv4path = "./ddv4/model-resnet_custom_v4.h5"# downloaded from releases
ddv4tagspath = "./ddv4/tags.txt"
ddv4chartagspath = "./ddv4/tags-character.txt"

dd: keras.Model = keras.models.load_model(ddv3path)

config = dd.get_config()  # Returns pretty much every information about your model
print(config["layers"][0]["config"]["batch_input_shape"]) # -> (None, 512, 512, 3)

print(dd.summary()) # ...

The only challenging part is processing the image but this repo already offers two functions in the images/ directory.

I have more questions though...

I saw a blog based on a modified version before, but I'm not sure if it's what you want. It may require you to use translation software and an old version of Deepdooru

https://www.sitstars.com/archives/75/

What version of deepdanbooru release is used in the onnx?

And what version is used in the demo site linked in this repo? (http://dev.kanotype.net:8003/deepdanbooru/)

There is a discrepancy between the newest release and the site. I tested a Sugiura Ayano pic and a Kizuna Ai pic but they had a weak score for the character tags, the demo website got them right with a threshold of >0.5. I beg for more up to date releases as my 4gb potato is not capable of training these models. I might look into collab but I don't know if I can train on a massive dataset there. Pls keep the releases up to date if it is possible!!!

Edit: Turns out a part of the discrepancy was me compiling the model. Using the images/ functions from this repo also doesn't work nicely with the results so I instead I did:

def my_reshape(img:cv2.typing.MatLike):
    resized = tf.image.resize_with_pad(img,512,512)
    resized=keras.utils.normalize(resized,0)
    return tf.reshape(resized,(1,512,512,3))

which gives better prediction accuracy. I still have a small issue with the model not being up to date though I hope we can get newer releases. I am closing this issue as everything is resolved.

EDIT 2: Okay, tensorflow's resize with pad function is a bit horrible sometimes from personal experience (using pyplot reveals the horror). So instead I came up with my own resize_to_square function using cv2, it also adds padding and preserves aspect ratio:

def resize_to_square(img, size, pad_color=(0, 0, 0)):
    w,h = img.shape[1], img.shape[0]
    if w > h:
        aspect_ratio = h / w
        new_h = round(aspect_ratio * size)
        padding = size - new_h
        top, bottom = padding // 2, padding - padding // 2
        return cv2.copyMakeBorder(cv2.resize(img, (size, new_h)), top=top,
        bottom=bottom, left=0, right=0, value=pad_color, borderType=cv2.BORDER_CONSTANT)
    elif w < h:
        aspect_ratio = w / h
        new_w = int(round(aspect_ratio * size))
        padding = size - new_w
        left, right = padding // 2, padding - padding // 2
        return cv2.copyMakeBorder(cv2.resize(img, (new_w, size)), top=0,
        bottom=0, left=left, right=right, value=pad_color, borderType=cv2.BORDER_CONSTANT)

    return cv2.resize(img, (size, size))

And now that sugiura ayano pic is actually evaluated correctly and her name shows up in the tags with >0.5 threshold.