d8ahazard / sd_dreambooth_extension

Other
1.86k stars 281 forks source link

[Bug]: SDXL Branch tests, bucketing issues and Deprecated functions #1340

Closed Pychnight closed 10 months ago

Pychnight commented 1 year ago

Is there an existing issue for this?

What happened?

with the recent 1.6 update on Auto111 there has been a few issues one is below, this still works but small note.

D:\stable-diffusion-webui\extensions\sd_dreambooth_extension\scripts\main.py:261: GradioDeprecationWarning: The `style` method is deprecated. Please set these arguments in the constructor instead.
  with gr.Row().style(equal_height=False):
CUDA SETUP: Loading binary D:\stable-diffusion-webui\venv\lib\site-packages\bitsandbytes\libbitsandbytes_cudaall.dll...
D:\stable-diffusion-webui\extensions\sd_dreambooth_extension\scripts\main.py:1051: GradioDeprecationWarning: The `style` method is deprecated. Please set these arguments in the constructor instead.
  db_gallery = gr.Gallery(
D:\stable-diffusion-webui\extensions\sd_dreambooth_extension\scripts\main.py:1051: GradioDeprecationWarning: The 'grid' parameter will be deprecated. Please use 'columns' in the constructor instead.
  db_gallery = gr.Gallery(

The other issue bucketing on large datasets with SDXL take far too long (8+ hours) and also Run out of memory half way though the bucketing process and having issues with training with "constant" setting, so, I'm using constant with warm up.

Kohya_ss solution to this is allow you to turn off bucketing completely, I know this is not likely the ideal solution without having to rewrite the batching script to better support the differences between SDXL and SD1.5, 2.1 and scale with computers that have high end CPU's

Though as a option, might be a good idea to allow people to turn off bucketing,

Steps to reproduce the problem

Create a very large dataset, use SDXL branch and try to train the data.

The bucketing (prepare dataset) step and the Latest catch causes really long batching. (8+ hours before the model even trains) and runs out of memory half way though the bucketing process on large datasets.

Commit and libraries

Initializing Dreambooth Dreambooth revision: c43f76347a5aa82dcc61a0ba03fad3be981bd772 Successfully installed accelerate-0.22.0 bitsandbytes-0.35.4 dadaptation-3.1 diffusers-0.20.2 discord-webhook-1.1.0 fastapi-0.94.1 lion-pytorch-0.1.2 tensorboard-2.13.0 tqdm-4.65.0

[+] xformers version 0.0.21 installed. [+] torch version 2.0.1+cu118 installed. [+] torchvision version 0.15.2+cu118 installed. [+] accelerate version 0.22.0 installed. [+] diffusers version 0.20.2 installed. [+] transformers version 4.30.2 installed. [+] bitsandbytes version 0.35.4 installed.

Command Line Arguments

--xformers

Console logs

No errors in this log

Additional information

No response

github-actions[bot] commented 11 months ago

This issue is stale because it has been open 5 days with no activity. Remove stale label or comment or this will be closed in 5 days

Saduff commented 11 months ago

How were you able to start training on the SDXL branch at all? Don't you get the KeyError mentioned in #1337?

Pychnight commented 11 months ago

yeah, did not encounter that error, there are some errors i can fix myself but some are more difficult to pinpoint the cause like this bucketing issue that seems to effect SDXL models having 10+ hours just to run out of memory half way though (using a 4090), hopefully they resolve the bucketing speed issues and memory issues, i have disabled bucketing for now in order to test to see if it's in a working state, so far they have come a long way in the dev branch it's always interesting to see what they do and how they approach each issue.

@d8ahazard out of curiosity this extension seems to break with almost every major a111 update, have you thought about making a stand alone version that's not dependent on the A111 Repo?, i know we got kohya_ss but i really like how your training script works over that one and the fact that you use the json for unlimited concepts instead of specialized folder names. having more control over what the trigger words are and what reg directory and instance data...

github-actions[bot] commented 11 months ago

This issue is stale because it has been open 5 days with no activity. Remove stale label or comment or this will be closed in 5 days