[Bug]: Broken Dreambooth LoRA training

AUTOMATIC1111 / stable-diffusion-webui

Stable Diffusion web UI

GNU Affero General Public License v3.0

141.45k stars 26.73k forks source link

[Bug]: Broken Dreambooth LoRA training #11312

Open levicki opened 1 year ago

levicki commented 1 year ago

Is there an existing issue for this?

[X] I have searched the existing issues and checked the recent builds/commits

What happened?

LoRA training in bf16 precision is impossible on NVIDIA cards due to runtime error.

Please see related issue in pytorch which was resolved in newer build (2.1.0a0+git22ca1a1).

Steps to reproduce the problem

N/A

What should have happened?

N/A

Commit where the problem happens

baf6946e06249c5af9851c60171692c44ef633e0

What Python version are you running on ?

Python 3.10.x

What platforms do you use to access the UI ?

Windows

What device are you running WebUI on?

Nvidia GPUs (RTX 20 above)

What browsers do you use to access the UI ?

Brave

Command Line Arguments

--api --xformers

List of extensions

Console logs

RuntimeError: "triu_tril_cuda_template" not implemented for 'BFloat16'

Additional information

N/A

missionfloyd commented 1 year ago

resolved in newer build (2.1.0a0+git22ca1a1)

Torch 2.1 nightly can be installed by adding

set TORCH_COMMAND=pip install --pre torch torchvision --extra-index-url https://download.pytorch.org/whl/nightly/cu118

to webui-user.bat and, adding --reinstall-torch to COMMANDLINE_ARGS (remove it afterward.)

You could also try kohya_ss for training.

levicki commented 1 year ago

@missionfloyd

Thanks for responding.

Torch 2.1 nightly can be installed by adding...

I know, but then I enter the maze of dependencies (for example xformers 0.0.17 might not work with that and there are probably other packages which will complain).

You could also try kohya_ss for training.

I have already considered it, but it is less convenient to have another environment for training and having to switch between the two.

I would prefer if requirements for AUTOMATIC1111 were updated to use torch 2.1 (unless that breaks something else of course).

Edit: Oh, and it would also be nice if you supported CUDA 12.1 pytorch (https://download.pytorch.org/whl/nightly/cu121) — perhaps as an option during install?

w-e-w commented 1 year ago

you should be able to just set the env TORCH_INDEX_URL to the torch whl url https://github.com/AUTOMATIC1111/stable-diffusion-webui/blob/59419bd64a1581caccaac04dceb66c1c069a2db1/modules/launch_utils.py#L228

set TORCH_INDEX_URL=https://download.pytorch.org/whl/nightly/cu118