Adapts memory-efficient attention to large unet_bs

basujindal / stable-diffusion

Optimized Stable Diffusion modified to run on lower GPU VRAM

Other

3.14k stars 469 forks source link

Adapts memory-efficient attention to large unet_bs #122

Closed ryudrigo closed 2 years ago

ryudrigo commented 2 years ago

And polishes memory-efficient attention in general.

Enables e.g. 1024px generation on 8 GB.

Inspired by comments by @Doggettx on #117

Currently, attn_step is set to 1. If you want more speed and less memory efficiency, you'd have to change that on ldm/modules/attention.py line 153.

@basujindal I didn't want to change the CLI or gradio commands, if you want that to be a parameter, I can modify the PR if you'd like.

TheEnhas commented 2 years ago

Probably a good idea to leave it at 1 IMO since this fork seems to be about having the best memory efficiency at the cost of speed, which is great for low VRAM GPUs, big generations or for just general use without worrying about OOM errors (I have a 8 GB 3060 Ti and use mostly this for the latter reason). And there's still turbo mode on top of that for only 1GB VRAM more, I almost always use it and it's much faster.

basujindal commented 2 years ago

Hi, thanks a lot for adding creating a pull request for these changes. Before I merge, can you please remove the changes for inpaint_gradio.py? If I am not wrong its the old inpaint file before the changes in the last commit. Thanks!

ryudrigo commented 2 years ago

I didn't notice I was leaving the new inpainting out. Thanks! Just corrected it.

ryudrigo commented 2 years ago

Please don't merge just yet -- I need to uncomment the mask code and test it

rockerBOO commented 2 years ago

I have tested this with txt2img 1024x1024 on a 1080 8GB and works great.

ryudrigo commented 2 years ago

Still working on that masking code as part of a larger inpainting PR.

So far, the code I commented out is not used by the inpainting script, so it won't make a difference. But if anyone uses this in another repo, please look at the linked issue (#129) and at the code

TingTingin commented 2 years ago

do you adjust it up or down the attention steps and does it have to be and int or can it be a float

ryudrigo commented 2 years ago

do you adjust it up or down the attention steps and does it have to be and int or can it be a float

Not sure if I understood the question, I'll try to answer as best as I can. I introduced the parameter att_steps, which has to be an int. You can test it if you want to check, but, from what I've seen, there is not much reason to use it greater than 1. The delay is very small.

TingTingin commented 2 years ago

Sorry for not being clear was referring to this if it has to be an it i guess it can only go up so you all read answered

Currently, attn_step is set to 1. If you want more speed and less memory efficiency, you'd have to change that on ldm/modules/attention.py line 153.`

ryudrigo commented 2 years ago

Oh all right! I rested it more and found out the speed improvement is really small (less than 10%) so I'd just leave it at 1

TingTingin commented 2 years ago

Yeah on my system it didn't seem to show any significant change either

remybonnav commented 1 year ago

I cannot find your modified attention.py It seems that your stable-diffusion repo is offline