d8ahazard / sd_smartprocess

Smart Pre-processing extension for Stable Diffusion
191 stars 19 forks source link

Feature: cropping for aspect ratio buckets #19

Open Arilziem opened 1 year ago

Arilziem commented 1 year ago

It would be really cool if this extension made it possible to define a number of aspect ratio buckets that images are then cropped into (matching to the closest aspect).

Could make usage for gradient accumulation / batching more efficient.

I'm thinking a slider in increments of 2 which will use predefined aspect ratios, with e.g. 1 being [1/1], 3 being [1/1, 3/2, 2/3], 5 being [1/1, 3/2, 2/3, 16/9, 9/16], 7 being [1/1, 3/2, 2/3, 16/9, 9/16, 2/1, 1/2] and then start using aspects in between I guess.

Maybe add a toggle for custom aspects or custom sizes.

Optionally move images into separate bucket directories?

eftSharptooth commented 1 year ago

I believe this works in smartprocess:

        acceptable_res = [[max_size, max_size] , [max_size, max_size + 64], [max_size + 64, max_size], [max_size, max_size + 128], [max_size + 128, max_size], [max_size, max_size + 192], [max_size + 192, max_size]]
        if pad:
            if bucketing:
                ratio = 1
                src_ratio = img.width / img.height
                target_size = 0

                target_size = acceptable_res[0]
                for resolutions in acceptable_res:
                    if abs(resolutions[0]/resolutions[1] - src_ratio) < abs(target_size[0]/target_size[1] - src_ratio):
                        target_size = resolutions

                src_w = target_size[0] if ratio < src_ratio else img.width * target_size[1] // img.height
                src_h = target_size[1] if ratio >= src_ratio else img.height * target_size[0] // img.width
                resized = images.resize_image(0, img, src_w, src_h)

                res = Image.new("RGB", (target_size[0], target_size[1]))
                if target_size[0] != max_size or target_size[1] != max_size:
                    res.paste(resized, box=(target_size[0] // 2 - src_w // 2, target_size[1] // 2 - src_h // 2))
                else:
                    res.paste(resized, box=(max_size // 2 - src_w // 2, max_size // 2 - src_h // 2))
                img = res
            else:
                ratio = 1
                src_ratio = img.width / img.height

                src_w = max_size if ratio < src_ratio else img.width * max_size // img.height
                src_h = max_size if ratio >= src_ratio else img.height * max_size // img.width

                resized = images.resize_image(0, img, src_w, src_h)
                res = Image.new("RGB", (max_size, max_size))
                res.paste(resized, box=(max_size // 2 - src_w // 2, max_size // 2 - src_h // 2))
                img = res

        # Resize again if image is not at the right size.
        if bucketing:
            if img.width != max_size and img.height != max_size:
                img = resize_image(1, img, max_size, max_size)
        else:
            if img.width != max_size or img.height != max_size:
                img = images.resize_image(1, img, max_size, max_size)

And add a bucketing var to preprocess() and a bucketing checkbox to the gradio main.py This will not move them into separate directories, I havent actually tested bucketing in this yet so I wrote this to convert my dataset

eftSharptooth commented 1 year ago

This is the part to add bucketing to the beginning of smartprocess: def preprocess(rename, src, dst, pad, crop, max_size, txt_action, flip, caption, caption_length, caption_clip, clip_use_v2, clip_append_flavor, clip_max_flavors, clip_append_medium, clip_append_movement, clip_append_artist, clip_append_trending, caption_wd14, wd14_min_score, caption_deepbooru, booru_min_score, subject_class, subject, replace_class, restore_faces, face_model, upscale, upscale_ratio, scaler, bucketing ):

eftSharptooth commented 1 year ago

For the checkbox in Gradio, scripts/main.py: with gr.Tab("Cropping"): sp_size = gr.Slider(minimum=64, maximum=2048, step=64, label="Output Size", value=512) sp_pad = gr.Checkbox(label="Pad Images") sp_crop = gr.Checkbox(label='Crop Images') sp_flip = gr.Checkbox(label='Create flipped copies') sp_bucketing = gr.Checkbox(label='Aspect Ratio Bucketing')

eftSharptooth commented 1 year ago

This will only work with padding enabled.

d8ahazard commented 1 year ago

So, this extension has always been intended as a sort of companion to my Dreambooth extension.

One of the things that I've recently added to my Dreambooth extension is auto-handling of images larger than the "max" resolution. Part of that auto-handling is determining the closest bucket that the image will fit in using logic very similar to the stuff posted above.

Honestly, I was kind of considering removing the cropping bit entirely from the extension, as it's not really needed. Unless the goal is to remove a part of the image for training, you don't really need it for Dreambooth anymore.

Arilziem commented 1 year ago

My thought behind bucketing in preprocessing is to be able to optimize batching/gradient accumulation. Doing the cropping beforehand makes it easier to ensure that every bucket divides up nicely by batch size, which I think is especially useful when you are training multiple concepts where single buckets may get pretty small (or have overlap with other concepts, degrading gradient quality).

eftSharptooth commented 1 year ago

Honestly, I hadn't even looked into cropping at all. I had been padding and getting good results anyways, but had noticed that the further into training I got, the more the padding would show. This was just a way to test out the aspect ratio bucketing, and I also want to see how it changes the results with less padding. It obviously still does pad, but its much less now that the images are closer to the bucket size. I think cropping could work very well once we figure out the limits of the bucketing, I only took it out so far from max_size, but it could just chop off these ever smaller bits and you'd fit more into it. I actually did this to test the bucketing on your dreambooth extension, I got it working today so we will see how it turns out!

I am testing with a very a very large dataset I had originally padded with this extension, so doing the bucketing here made the most sense, as I have also modified the extension to "crawl" the target directory, as I arrange the dataset in a folder structure. I also have text files which direct the extension to perform the various options, like which class to replace, which captioning to use, etc. As the dataset is nearing 100,000 images it makes it so much easier to just set it and forget it, as opposed to previous, where I would do each subfolder individually and took just many many hours.

eftSharptooth commented 1 year ago

Also, sorry about the above code, it is messy. I just wanted something quick to process into the buckets to test the dreambooth!

eftSharptooth commented 1 year ago

Ah also the dataset structure allows for easy (sort-of!) creation of the dataset needed to train your own caption package like DeepDanbooru. Basically it allows you to process the dataset with existing tools, correct captions as needed (helped by having things in subfolders and removing/replacing/adding tags via python) then you can let it run and it will generate and populate the SQLite and list of unique tags needed to train your own classifier. I haven't tested that yet, but it does create the data needed for the training successfully.