Open yoinked-h opened 1 year ago
note, most of this wont be possible until the next paper; as only simulated quantization is implemented. no quantization; no speedup/optimizations, only proof of concept
probably will be very easy or very hard to implement SD 2.x
note, most of this wont be possible until the next paper; as only simulated quantization is implemented. no quantization; no speedup/optimizations, only proof of concept
probably will be very easy or very hard to implement SD 2.x
Hi @yoinked-h this is an intersting blog about quant for SD https://developer.nvidia.com/blog/accelerate-generative-ai-inference-performance-with-nvidia-tensorrt-model-optimizer-now-publicly-available/ Nearly 2x Faster. Therefore, I would like to ask Webui has any interest to support the feature?
Is there an existing issue for this?
What would your feature do ?
q-diffusion implentation would give a speedup (due to using INT8/INT4 [with good outputs too!]) and shrink file sizes down
Q-DIffusion paper Repo
tldr for paper: quantizing the diffusion models in a different way makes it not lose much precision, even at int4
Proposed workflow
either:
Additional information
as of now, the hard parts of the implentation is getting the inference part sped up; even the paper notes it in the conclusion detecting the model might be hard, as it can be mixed in all sorts of ways