xformers on WSL2+Ubuntu for AUTOMATIC1111

Deminisa commented 1 year ago

Is this something that is in the works for this repo? Alternatively has anyone been able to get xformers up and running with Win 11 WSL2 + Ubuntu? On newest docker and docker-compose v2.10.2

Tried to replicate whats happening in autos launch.py (https://github.com/AUTOMATIC1111/stable-diffusion-webui/blob/a004d1a855311b0d7ff2976a4e31b0247ad9d1f6/launch.py#L133) to install xformers via pip and also adding --xformers to docker-compose here https://github.com/AbdBarho/stable-diffusion-webui-docker/blob/3b3c244c3128d3fcebb236a07bc601cb385e73bb/docker-compose.yml#L42

Have also tried:

Building it using the instructions at https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Xformers via cli on the running container.
pip installing the linux version of the wheel thats being used for windows (https://github.com/C43H66N12O12S2/stable-diffusion-webui/releases/download/linux/xformers-0.0.14.dev0-cp310-cp310-linux_x86_64.whl)

Thanks!

AbdBarho commented 1 year ago

https://anaconda.org/xformers/xformers

Should be enough, haven't tried it though. I will see if I get to it before the weekend.

Deminisa commented 1 year ago

Thanks for the quick response 👍

Do you mean installing it with conda install -c "xformers/label/dev" xformers in cli? Since the miniconda image was replaced with python 3.10-slim a couple of days ago (https://github.com/AbdBarho/stable-diffusion-webui-docker/commit/5698c49653dcda6d02b23d4c049971c24318bf64), conda isn't available anymore in the container.

I will see if I get to it before the weekend.

Would be fantastic if you could have a look for sure. It's allegedly increasing it/s performance so should be a great thing to have🙂

ProducerMatt commented 1 year ago

I was able to get it installed and the web UI running, but not actually producing images. https://github.com/ProducerMatt/stable-diffusion-webui-docker/commit/c197f04a0c84cd3565651b82693d3c561e42e576

When requested for an image, it leaves a Python trace back with this error:

webui-docker-auto-1  | NotImplementedError: Could not run 'xformers::efficient_attention_forward_cutlass' with arguments from the 'CUDA' backend. This could be because the operator doesn't 
exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.
com/ptmfixes for possible resolutions. 'xformers::efficient_attention_forward_cutlass' is only available for these backends: [UNKNOWN_TENSOR_TYPE_ID, QuantizedXPU, UNKNOWN_TENSOR_TYPE_ID, U
NKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, SparseCPU, SparseCUDA, SparseHIP, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENS
OR_TYPE_ID, SparseVE, UNKNOWN_TENSOR_TYPE_ID, NestedTensorCUDA, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNO
WN_TENSOR_TYPE_ID].

I've never worked with Docker before so my apologies if I've missed anything obvious. I'm trying to find decent tutorial material, but Docker Googles are drowning in low-quality SEO spam.

AbdBarho commented 1 year ago

@ProducerMatt eyy no worries man, xformers is still a very young library and needs more time to mature.

if I am not mistaken, the library needs to know which GPU you have while building to optimize the build for your architecture. can you try adding this

TORCH_CUDA_ARCH_LIST = "6.0;6.1;6.2;7.0;7.2;7.5;8.0;8.6"

before the pip install?

this users seems to have the same problem: https://github.com/facebookresearch/xformers/issues/474

slix commented 1 year ago

https://anaconda.org/xformers/xformers

Should be enough, haven't tried it though. I will see if I get to it before the weekend.

The Dockerfile depends on Debian bullseye right now. And I think the prebuilt stuff in conda depends on a version of glibc that is one minor version above what's included with bullseye.

At least when I ran the conda install, it said that it couldn't resolve the environment in any way due to the glibc version being off.

ProducerMatt commented 1 year ago

The CUDA arch thing from @AbdBarho got me past that error but I run headfirst into this one predicted by @slix. The current installed version of g++ (10.2.1) is greater than the maximum required version by CUDA 11.2 (10.0.0). Please make sure to use an adequate version of g++ (>=5.0.0, <=10.0.0) https://github.com/ProducerMatt/stable-diffusion-webui-docker/commit/9c9ba5f5dee6f3032b608a6ff1beee332cdac6b9

slix commented 1 year ago

I run headfirst into this one predicted by @slix. The current installed version of g++ (10.2.1) is greater than the maximum required version by CUDA 11.2 (10.0.0). Please make sure to use an adequate version of g++ (>=5.0.0, <=10.0.0)

This is different than the error I ran into with conda install. My error was about glibc, not g++.

slix commented 1 year ago

After a lot of trial and error, I got a working build and usage of xformers in Docker. I'm able to generate images in the web UI without error.

In docker-compose.yml: add --xformers to CLI_ARGS.

services/AUTOMATIC1111/Dockerfile right before COPY . /docker:

RUN <<EOF
apt-get install software-properties-common gpg wget -y
# Warning: software-properties-common installed Python 3.9
#update-alternatives --install /usr/bin/python python /usr/bin/python3.10 100
add-apt-repository contrib
apt-key del 7fa2af80

wget https://developer.download.nvidia.com/compute/cuda/repos/debian11/x86_64/cuda-keyring_1.0-1_all.deb
dpkg -i cuda-keyring_1.0-1_all.deb

apt-get update
apt-get install cuda-nvcc-11-8 cuda-libraries-dev-11-8 -y
EOF

RUN FORCE_CUDA=1 TORCH_CUDA_ARCH_LIST="6.0;6.1;6.2;7.0;7.2;7.5;8.0;8.6" pip install git+https://github.com/facebookresearch/xformers.git@ba93c5012d00bd1b010514a7bc9bd938c1ad6149#egg=xformers

With all those entries in the arch list, the build takes me ~90 minutes. But the full list will work for most graphics cards and is probably bullet-proof.

Not optimized at all.

Using conda would probably be better. But I ran into that glibc error (due to Debian bullseye?). I'm guessing that using conda instead will mean changing the base image for services/AUTOMATIC1111/Dockerfile.

DevilaN commented 1 year ago

Tried it, and I am confirming, that it works with my GP107M [GeForce GTX 1050 Ti Mobile]. Starting gives nice message: "webui-docker-auto-1 | Applying xformers cross attention optimization." Also performance gain on my laptop is almost 100%. Generation time is halved probably due to less thermal throttling.

Seems that it needs a bit of reorganizing of package installing and removing cache and unnecessary packages after xformers install so image takes less amount of place (currently it is 11.1GB).

40 minutes to download packages from nvidia contrib (download speed funnelled by remote host). 75 minutes spent on "Building wheel for xformers (setup.py): still running...". Here it could be probably sped up by using multiple cores during compilation instead of only one thread. Something like MAKEOPTS=-j $(nproc) may be possible.

Anytime soon we incorporate this into main branch of repository?

Deminisa commented 1 year ago

Look at all the activity. Great stuff! ❤️

With some small changes of the code from @slix , the following worked on my end

1) (I did this manually from the CLI in the auto container created by this repo)

apt-get install gpg wget -y
wget https://developer.download.nvidia.com/compute/cuda/repos/debian11/x86_64/cuda-keyring_1.0-1_all.deb
dpkg -i cuda-keyring_1.0-1_all.deb
apt-get update
apt-get install cuda-nvcc-11-8 cuda-libraries-dev-11-8 -y
FORCE_CUDA=1 TORCH_CUDA_ARCH_LIST="6.0;6.1;6.2;7.0;7.2;7.5;8.0;8.6" pip wheel --wheel-dir=/data git+https://github.com/facebookresearch/xformers.git@ba93c5012d00bd1b010514a7bc9bd938c1ad6149#egg=xformers

Some observations/notes

Removing software-properties-common took the Python 3.9 vs 3.10 question out of the equation and made everything else work as-is
I saw that wget was missing from the image. Unsure if gpg was already available, so that can possibly be removed as well
Made pip save the wheel in the /data folder as I'm unsure which changes would cause a rebuild and hoping this can possibly save a lot of time down the line as it took around 80-90 minutes to build. Kind of how the download container is a long running one-and-done setup process.

2) Perhaps not the most elegant solution, but I replaced https://github.com/AbdBarho/stable-diffusion-webui-docker/blob/b60c78747488ec000868137db2a1e32d8d1a6e29/services/AUTOMATIC1111/Dockerfile#L77-L78 with

CMD /docker/mount.sh && \
  pip install '/data/xformers-0.0.14.dev0-cp310-cp310-linux_x86_64.whl' && \
  python3 -u ../../webui.py --listen --port 7860 --hide-ui-dir-config --ckpt-dir ${ROOT}/models/Stable-diffusion ${CLI_ARGS}

3) And finally added --xformers to the CLI args in docker-compose: https://github.com/AbdBarho/stable-diffusion-webui-docker/blob/3b3c244c3128d3fcebb236a07bc601cb385e73bb/docker-compose.yml#L42

With xformers I'm seeing roughly 33% performance increase on a GTX 1080

ProducerMatt commented 1 year ago

I saw a message in the xformers build log that installing ninja-build would speed up the build process. I tried it and it appears to have cut 3/4ths of the xformers build time on a brand new container, which is so drastic I'm re-timing with & without ninja to make sure I didn't screw up somewhere. EDIT: not a fluke! :)

slix commented 1 year ago

Some more possibilities for faster builds:

nvidia contrib

My steps for installing Nvidia CUDA packages are from: https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#debian-installation

The "Remove Outdated Signing Key" step seems to be for security reasons.

AbdBarho commented 1 year ago

Thank you all

I am working on it, but my bad internet is not helping much (500kb/s)

You are welcome to contribute a solution!

AbdBarho commented 1 year ago

I managed to get it up & running, ~20% improvement on my 1060, from 25s per image to 20s. could you all please try it and report your results?

the branch is called xf #136 (don't ask about the commits)

eftSharptooth commented 1 year ago

whl file appears to work, I ran into the issue you just corrected with the mount condition check on the directories and manually added the whl install and that worked fine.

AbdBarho commented 1 year ago

@eftSharptooth I just fixed it in the latest master, I will rebase!

eftSharptooth commented 1 year ago

I have a training running now, but once it is finished I will also test on a linux system running docker, as opposed to the previous successful Windows 10 test.

eftSharptooth commented 1 year ago

The wheel does work on a linux host as well.

DevilaN commented 1 year ago

@AbdBarho : Newly added pip install commands does not use --no-cache-dir option. Any particular reason behind this?

AbdBarho commented 1 year ago

@DevilaN yes! laziness.

davedavis commented 1 year ago

@AbdBarho Thanks so much for doing this. Was pulling my hair out trying to get xformers installed locally. But have your dockerized version working in minutes (on Ubuntu 22.04). Have been dying to try out SD2 and now I can.

Wanted to take your xformers work and use it locally (I've migrated to your container version now for SD though as it's working great). But out of curiosity, is this what you're doing?:

1) You built the wheel yourself (again, thank you!) 2) You're sticking it in the local cache 3) You're telling pip that any time xformers should be installed, to use your wheel instead?

Thanks so much for all your effort here.

AbdBarho commented 1 year ago

@davedavis pretty much what you said

I did some tweaks to the xformers build so that it targets all supported cuda architectures to accumudate the hardware variation for users of this repo. I uploaded the wheel to the github release and I basically download it from there in the container.

The wheel is only for linux, python3.10, and torch 1.12.1. I already have everything setup so in case you need wheels for a different config I can gladly build it for you.

Merry Christmas 🎄

bitshifter52 commented 1 year ago

Look at all the activity. Great stuff! heart

With some small changes of the code from @slix , the following worked on my end

(I did this manually from the CLI in the auto container created by this repo)
apt-get install gpg wget -y
wget https://developer.download.nvidia.com/compute/cuda/repos/debian11/x86_64/cuda-keyring_1.0-1_all.deb
dpkg -i cuda-keyring_1.0-1_all.deb
apt-get update
apt-get install cuda-nvcc-11-8 cuda-libraries-dev-11-8 -y
FORCE_CUDA=1 TORCH_CUDA_ARCH_LIST="6.0;6.1;6.2;7.0;7.2;7.5;8.0;8.6" pip wheel --wheel-dir=/data git+https://github.com/facebookresearch/xformers.git@ba93c5012d00bd1b010514a7bc9bd938c1ad6149#egg=xformers
Some observations/notes

Removing software-properties-common took the Python 3.9 vs 3.10 question out of the equation and made everything else work as-is

I saw that wget was missing from the image. Unsure if gpg was already available, so that can possibly be removed as well

Made pip save the wheel in the /data folder as I'm unsure which changes would cause a rebuild and hoping this can possibly save a lot of time down the line as it took around 80-90 minutes to build. Kind of how the download container is a long running one-and-done setup process.

Perhaps not the most elegant solution, but I replaced

https://github.com/AbdBarho/stable-diffusion-webui-docker/blob/b60c78747488ec000868137db2a1e32d8d1a6e29/services/AUTOMATIC1111/Dockerfile#L77-L78

with
CMD /docker/mount.sh && \
  pip install '/data/xformers-0.0.14.dev0-cp310-cp310-linux_x86_64.whl' && \
  python3 -u ../../webui.py --listen --port 7860 --hide-ui-dir-config --ckpt-dir ${ROOT}/models/Stable-diffusion ${CLI_ARGS}
And finally added --xformers to the CLI args in docker-compose:

https://github.com/AbdBarho/stable-diffusion-webui-docker/blob/3b3c244c3128d3fcebb236a07bc601cb385e73bb/docker-compose.yml#L42

With xformers I'm seeing roughly 33% performance increase on a GTX 1080

This may be a little late in the game, however, I went through the steps you outlined in your post and ended up with the following errors:

sudo FORCE_CUDA=1 TORCH_CUDA_ARCH_LIST="6.0;6.1;6.2;7.0;7.2;7.5;8.0;8.6" pip wheel --wheel-dir=/data git+https://github.com/facebookresearch/xformers.git@ba93c5012d00bd1b010514a7bc9bd938c1ad6149#egg=xformers Collecting xformers Cloning https://github.com/facebookresearch/xformers.git (to revision ba93c5012d00bd1b010514a7bc9bd938c1ad6149) to /tmp/pip-wheel-fxnevjzg/xformers_8b996147fe1c4a798b766eaad551be79 Running command git clone --filter=blob:none --quiet https://github.com/facebookresearch/xformers.git /tmp/pip-wheel-fxnevjzg/xformers_8b996147fe1c4a798b766eaad551be79 Running command git rev-parse -q --verify 'sha^ba93c5012d00bd1b010514a7bc9bd938c1ad6149' Running command git fetch -q https://github.com/facebookresearch/xformers.git ba93c5012d00bd1b010514a7bc9bd938c1ad6149 Running command git checkout -q ba93c5012d00bd1b010514a7bc9bd938c1ad6149 Resolved https://github.com/facebookresearch/xformers.git to commit ba93c5012d00bd1b010514a7bc9bd938c1ad6149 Running command git submodule update --init --recursive -q Preparing metadata (setup.py) ... error error: subprocess-exited-with-error

× python setup.py egg_info did not run successfully. │ exit code: 1 ╰─> [6 lines of output] Traceback (most recent call last): File "", line 2, in File "", line 34, in File "/tmp/pip-wheel-fxnevjzg/xformers_8b996147fe1c4a798b766eaad551be79/setup.py", line 17, in import torch ModuleNotFoundError: No module named 'torch' [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip. error: metadata-generation-failed

× Encountered error while generating package metadata. ╰─> See above for output.

note: This is an issue with the package mentioned above, not pip. hint: See above for details.

I DO have torch installed with the AUTMATIC1111 installation shell script - I don't know how to proceed from here. Thanks for your efforts.

AbdBarho commented 1 year ago

@bitshifter52 what are your versions of nvcc, torch, and python?

AbdBarho / stable-diffusion-webui-docker

xformers on WSL2+Ubuntu for AUTOMATIC1111 #128