[Feature Request]: Q-Diffusion

yoinked-h commented 1 year ago

Is there an existing issue for this?

[X] I have searched the existing issues and checked the recent builds/commits

What would your feature do ?

q-diffusion implentation would give a speedup (due to using INT8/INT4 [with good outputs too!]) and shrink file sizes down

Q-DIffusion paper Repo

tldr for paper: quantizing the diffusion models in a different way makes it not lose much precision, even at int4

Proposed workflow

either:

Have it in a seperate tab (along with Loras)
Click the tab to select
Select the desired model or:
Have it be seamless [looks like a normal model, loads like a normal model]

Additional information

as of now, the hard parts of the implentation is getting the inference part sped up; even the paper notes it in the conclusion detecting the model might be hard, as it can be mixed in all sorts of ways

yoinked-h commented 1 year ago

note, most of this wont be possible until the next paper; as only simulated quantization is implemented. no quantization; no speedup/optimizations, only proof of concept

probably will be very easy or very hard to implement SD 2.x

bigmover commented 5 months ago

note, most of this wont be possible until the next paper; as only simulated quantization is implemented. no quantization; no speedup/optimizations, only proof of concept

probably will be very easy or very hard to implement SD 2.x

Hi @yoinked-h this is an intersting blog about quant for SD https://developer.nvidia.com/blog/accelerate-generative-ai-inference-performance-with-nvidia-tensorrt-model-optimizer-now-publicly-available/ Nearly 2x Faster. Therefore, I would like to ask Webui has any interest to support the feature?

AUTOMATIC1111 / stable-diffusion-webui