Xiuyu-Li / q-diffusion

[ICCV 2023] Q-Diffusion: Quantizing Diffusion Models.
https://xiuyuli.com/qdiffusion/
MIT License
315 stars 21 forks source link

Extremely high VRAM usage #7

Closed easonoob closed 6 months ago

easonoob commented 1 year ago

Hello, I just tried this q-diffusion implementation and it seems cool. However, the VRAM usage is 20GB on my RTX 3090 with 4-bit weights-only and --n_samples 1. Here is my command:

python scripts/txt2img.py --prompt "elon musk wearing a suit" --plms --cond --ptq --weight_bit 4 --quant_mode qdiff --no_grad_ckpt --split --n_samples 1 --outdir outputs --cali_ckpt models/sd_w4a8_ckpt.pth

And the result: grid-0001 Are there any ways to reduce VRAM usage to 4 to 5GB? Thanks!

Xiuyu-Li commented 1 year ago

Hi, as explained in #1, currently the implementation is done with simulated quantization. With that said, we are working on releasing an end-to-end quantization implementation. If you are interested in only quantizing the weights to save VRAM, I will also consider having a weight-only truly quantized checkpoint released first.

wangjialinEcopia commented 1 year ago

Hi, Great Work. But I wonder What specific tools or methods are you using to transition from simulated quantization to real deployment for the end-to-end quantization implementation?