TheLastBen / fast-stable-diffusion

fast-stable-diffusion + DreamBooth
MIT License
7.55k stars 1.31k forks source link

Other gpu support #26

Open loretoparisi opened 2 years ago

loretoparisi commented 2 years ago

Hello, from the notebook I can see that pre-compiled are available only for T4, P100 and V100 gpus:

s = getoutput('nvidia-smi')
if 'T4' in s:
  gpu = 'T4'
elif 'P100' in s:
  gpu = 'P100'
elif 'V100' in s:
  gpu = 'V100'

What. about other gpu like K80? Thanks.

TheLastBen commented 2 years ago

if you have the K80 and want to add it to the supported list, run :

!pip install git+https://github.com/facebookresearch/xformers@51dd119#egg=xformers

after around 40min, and the installation is done, navigate to /usr/local/lib/python3.7/dist-packages/xformers

save the two files : "_C_flashattention.so" and "_C.so", upload them to any host and send me the link and I will integrate them in the Colab for K80 users.

the files might not show in the colab explorer, so you will have to rename them

!cp /usr/local/lib/python3.7/dist-packages/xformers/_C.so /usr/local/lib/python3.7/dist-packages/xformers/C.py

!cp /usr/local/lib/python3.7/dist-packages/xformers/_C_flashattention.so /usr/local/lib/python3.7/dist-packages/xformers/C_flashattention.py
misiek75 commented 2 years ago

Hi. I am using collab pro and very often I have a100 gpu assigned. How can you add to the gpu ?

TheLastBen commented 2 years ago

In a few hours it will be added to the colabs

Aivean commented 2 years ago

Perhaps this is the right issue. Running on A100 (colab), getting spammed in the output:

FATAL: this function is for sm80, but was built for sm600
TheLastBen commented 2 years ago

did you make a clean run with an update colab from the repo ?

Aivean commented 2 years ago

did you make a clean run with an update colab from the repo ?

Not sure, will try it now and report back.

... or perhaps not now, but when I get A100 again 🤷‍♂️

Aivean commented 2 years ago

@TheLastBen , actually, that doesn't seem to be A100-specific issue.

On V100 I see:

FATAL: this function is for sm70, but was built for sm600

Running freshly opened: https://colab.research.google.com/github/TheLastBen/fast-stable-diffusion/blob/main/fast_stable_diffusion_AUTOMATIC1111.ipynb

Is it a regression?


Will try to reproduce on clean AUTOMATIC1111 UI without attention patch.

UPD: confirmed, clean AUTOMATIC1111 works on V100, the issue is introduced by the patch.

TheLastBen commented 2 years ago

try to reproduce the error with a T4 (free colab)

TheLastBen commented 2 years ago

also try other colabs if the same issue happens

askiiart commented 2 years ago

save the two files : "_C_flashattention.so" and "_C.so", upload them to any host and send me the link and I will integrate them in the Colab for K80 users

I did this, but am using conda, so the directories don't match up. I found /home/ben/.conda/envs/my-env/lib/python3.10/site-packages/xformers, but there's no _C_flashattention.so in it, or anywhere on the system.

I did find _C.so, though; it's here

TheLastBen commented 2 years ago

For the K80 ? 287 KB looks a bit small, it should be at least 19mb, maybe it didn't compile well. try compiling it with google colab if you get the K80

askiiart commented 2 years ago

For the K80 ? 287 KB looks a bit small, it should be at least 19mb, maybe it didn't compile well. try compiling it with google colab if you get the K80

I'll let you know if I can get the files from a K80 colab. Right now all my accounts have a usage limit of no GPU, or are getting T4s, though, so it might be a while.

askiiart commented 2 years ago

@TheLastBen I was able to install xformers with a bigger _C.so file (13.1 MB) by un-init-ing conda, and got this. I still don't have a _C_flashattention.so anywhere on my system, though.

TheLastBen commented 2 years ago

GPUs unsupported by flash attention don't produce a _C_flashattention.so after compiling, but they still benefit from a speed increase

askiiart commented 2 years ago

GPUs unsupported by flash attention don't produce a _C_flashattention.so after compiling, but they still benefit from a speed increase

Ok, thanks. Quick question: what is _C_flashattention.so for/what does it do?

TheLastBen commented 2 years ago

The C/C++/Cuda code responsible for the xformers-specific operations (memory efficient attention included) for the underlying machine (python version, cuda, ..)

askiiart commented 2 years ago

@TheLastBen Would you be able to make the whl using that file and add it? I'd make the whl and do a new PR, but it's not working for me.

TheLastBen commented 2 years ago

Sorry, I completely forgot about it, I'll add it as soon as I'm done with the new Dreambooth method

TheLastBen commented 2 years ago

@TheLastBen Would you be able to make the whl using that file and add it? I'd make the whl and do a new PR, but it's not working for me.

Just added, if you get the K80 try it in A1111 Colab and let me know if the wheel works.