KichangKim / DeepDanbooru

AI based multi-label girl image classification system, implemented by using TensorFlow.
MIT License
2.65k stars 260 forks source link

Web Version "Use Cropping" #27

Closed Superfloh closed 4 years ago

Superfloh commented 4 years ago

Hi,

just a short question, what model version does the web version use, and what do you do in case the "Use Cropping" option is enabled? That option works well with manga pages, so I'm interested in how it works.

KichangKim commented 4 years ago

Web demo uses latest release ( https://github.com/KichangKim/DeepDanbooru/releases/tag/v3-20200915-sgd-e30 ).

"Use Cropping" is simple, it splits image into multiple small parts with overlap and independently estimates for each parts. Then it combine all estimated tags with filtering (remove mis-estimated tags due to splitting)

Superfloh commented 4 years ago

Sadly I'm getting different results with the v3 model and the web version, usually very similar tags but different scores. (not using the cropping option) The "Use Cropping" idea is pretty interesting, would you mind releasing the code for that ? ^^

Also on an unrelated sidenote, the requirements.txt still has tensorflow>=2.1.0.

Model v3 result: v3 result

Web result: web result

Original: original

KichangKim commented 4 years ago

For the historical reason, web demo and training program use different image pre-processing steps so it makes slightly different results. I do not have any plan to release full web demo code yet, but here are parts of its image pre-processing and cropping code. You can use this for generating the same result to web's.

# image_utility.py
import math

import skimage.transform
import numpy as np
import tensorflow as tf

def calculate_image_scale(source_width, source_height, target_width, target_height):
    """
    Calculate scale for image resizing while preserving aspect ratio.
    """
    if source_width == target_width and source_height == target_height:
        return 1.0

    source_ratio = source_width / source_height
    target_ratio = target_width / target_height

    if target_ratio < source_ratio:
        scale = target_width / source_width
    else:
        scale = target_height / source_height

    return scale

def transform_and_pad_image(image, target_width, target_height, scale=None, rotation=None, shift=None, order=1, mode='edge'):
    """
    Transform image and pad by edge pixles.
    """
    image_width = image.shape[1]
    image_height = image.shape[0]
    image_array = image

    # centerize
    t = skimage.transform.AffineTransform(
        translation=(-image_width * 0.5, -image_height * 0.5))

    if scale:
        t += skimage.transform.AffineTransform(scale=(scale, scale))

    if rotation:
        radian = (rotation / 180.0) * math.pi
        t += skimage.transform.AffineTransform(rotation=radian)

    t += skimage.transform.AffineTransform(
        translation=(target_width * 0.5, target_height * 0.5))

    if shift:
        t += skimage.transform.AffineTransform(
            translation=(target_width * shift[0], target_height * shift[1]))

    warp_shape = (target_height, target_width)

    image_array = skimage.transform.warp(
        image_array, (t).inverse, output_shape=warp_shape, order=order, mode=mode)

    return image_array

def crop_image(image, crop_box_ratio):
    width = image.shape[1]
    height = image.shape[0]

    (left_ratio, upper_ratio, right_ratio, lower_ratio) = crop_box_ratio

    width_start = int(width * left_ratio)
    width_end = int(width * right_ratio)
    height_start = int(height * upper_ratio)
    height_end = int(height * lower_ratio)

    return image[height_start:height_end, width_start:width_end, :]

def create_crop_box_ratio_list(ratio):
    return [
        (0, 0, ratio, ratio),
        (1 - ratio, 0, 1, ratio),
        (0, 1 - ratio, ratio, 1),
        (1 - ratio, 1 - ratio, 1, 1),
        ((1 - ratio) * 0.5,
         (1 - ratio) * 0.5, (1 + ratio) * 0.5, (1 + ratio) * 0.5)
    ]

def transform_image(image, width, height):
    source_height = image.shape[0]
    source_width = image.shape[1]

    scale = calculate_image_scale(source_width, source_height, width, height)
    image = transform_and_pad_image(image, width, height, scale=scale)

    return image / 255.0

def load_image(path):
    image_raw = tf.io.read_file(path)
    image = tf.io.decode_png(image_raw, channels=3)

    return image.numpy().astype(np.float32)

def resize_image(image, size):
    return tf.image.resize(image, size=size, method=tf.image.ResizeMethod.AREA, preserve_aspect_ratio=True).numpy()

The core method is transform_image(). Also here are cropping code:

y = model.predict(image_transformed)[0]

if crop == 'true':
    crop_box_ratio_list = image_utility.create_crop_box_ratio_list(0.6)

    for crop_box_ratio in crop_box_ratio_list:
        image_crop = image_utility.crop_image(image, crop_box_ratio)
        image_crop = image_utility.transform_image(
            image_crop, image_width, image_height)
        image_crop = image_crop.reshape(
            (1, image_crop.shape[0], image_crop.shape[1], image_crop.shape[2]))
        y_crop = model.predict(image_crop)[0]
        y_crop = np.multiply(
            y_crop, project_data['crop_exclude_tags_vector'])

        y = np.maximum(y, y_crop)
Superfloh commented 4 years ago

Thank you very much, loading and pre-processing the image with that code indeed gives the same result as on the webpage. For the cropping I'm missing the project_data['crop_exclude_tags_vector'], it doesn't exist in the project.json.

KichangKim commented 4 years ago

It is simple mask-vector (0 or 1). If the tag exists in exclude_tags, its value is 0, or not, 1.

Here is my exclude_tags:

1boy
2boys
3boys
4boys
5boys
6+boys
1girl
2girls
3girls
4girls
5girls
6+girls
1koma
2koma
3koma
4koma
5koma
solo
solo_focus
text_focus
ass_focus
male_focus
out-of-frame_censoring
out_of_frame
feet_out_of_frame
head_out_of_frame
lower_body
upper_body
portrait
close-up
rating:safe
rating:questionable
rating:explicit
score:very_bad
score:bad
score:average
score:good
score:very_good
Superfloh commented 4 years ago

I made a vector out of the tags mentioned above and I'm getting the same result as the web version now.

In case someone else is interested in the cropping feature, here is my code:

            project_context, model, tags = dd.project.load_project(project_path)
            width = model.input_shape[2]
            height = model.input_shape[1]
            try:
                image = load_image(image_path)
                image_transformed = transform_image(image, width=width, height=height)
            except:
                print("error loading the image")
                continue

            image_shape = image_transformed.shape
            image_transformed = image_transformed.reshape((1, image_shape[0], image_shape[1], image_shape[2]))
            y = model.predict(image_transformed)[0]

            if crop == 'true':
                crop_box_ratio_list = create_crop_box_ratio_list(0.6)
                for crop_box_ratio in crop_box_ratio_list:
                    image_crop = crop_image(image, crop_box_ratio)

                    image_crop = transform_image(image_crop, width=width, height=height)
                    image_crop = image_crop.reshape(
                        (1, image_crop.shape[0], image_crop.shape[1], image_crop.shape[2]))
                    y_crop = model.predict(image_crop)[0]

                    exclude_tags = np.fromfile(project + "/exclude_tags.txt", dtype=int, sep='\n')
                    y_crop = np.multiply(y_crop, exclude_tags)
                    y = np.maximum(y, y_crop)

And here the Vector exclude_tags.txt:

exclude_tags.txt

Thank you for your help.

rachmadaniHaryono commented 4 years ago
  1. why 0.6 ratio?
  2. how to choose tag to exclude?
KichangKim commented 4 years ago

@rachmadaniHaryono

  1. 4 corners (top-left, top-right, bottom-left, bottom-right) with 50% edge size + small overlap 10% = 60%
  2. Exclude incorrect estimation due to cropping. In ex, if the original image has 2 girls and cropped image has only one girl, it may return 1girl tag, so it should not be contained. Number-related, size-related, angle-related tags are candidates.