Closed easonoob closed 6 months ago
Hi, as explained in #1, currently the implementation is done with simulated quantization. With that said, we are working on releasing an end-to-end quantization implementation. If you are interested in only quantizing the weights to save VRAM, I will also consider having a weight-only truly quantized checkpoint released first.
Hi, Great Work. But I wonder What specific tools or methods are you using to transition from simulated quantization to real deployment for the end-to-end quantization implementation?
Hello, I just tried this q-diffusion implementation and it seems cool. However, the VRAM usage is 20GB on my RTX 3090 with
4-bit weights-only
and--n_samples 1
. Here is my command:And the result: Are there any ways to reduce VRAM usage to 4 to 5GB? Thanks!