A-suozhang / MixDQ

[ECCV24] MixDQ: Memory-Efficient Few-Step Text-to-Image Diffusion Models with Metric-Decoupled Mixed Precision Quantization
https://a-suozhang.xyz/mixdq.github.io/
19 stars 2 forks source link

can your code use w4a8? #6

Closed greasebig closed 3 weeks ago

greasebig commented 3 weeks ago

in pipeline.py
it says : This function helps quantize the UNet in the SDXL Pipeline Now we only support quantization with the setting W8A8

greasebig commented 3 weeks ago

in your paper you say it can be used in w4a8

load weight bits

# with open(w_config, 'r') as input_file:
if w_bit == 8:
    mod_name_to_weight_width = w8_uniform_config
else:
    raise RuntimeError("we only support int8 quantization")
# filter 'model.' from all names
A-suozhang commented 3 weeks ago

I may need further clarification to fully grasp your question. Is the above code containing "we only support int 8 quantization" from our huggingface demo?

The algorithm-level quantization simulation code (in github) supports mixed precision (including W4A8), the system-level quantization code (in huggingface, including the cuda kernel) only supports W8A8 for now, we are still working on the mixed precision CUDA kernel implementation.

greasebig commented 3 weeks ago

yes, my above pasted code from https://huggingface.co/nics-efc/MixDQ/tree/main

greasebig commented 3 weeks ago

thanks for your reply. i have another question, can your W8A8 be used in webui or comfyui?

A-suozhang commented 3 weeks ago

Currently, our code is built upon the huggingface diffusers package, as a customized pipeline. If webui or ComfyUI could support embedding diffusers pipeline (I know that ComfyUI support some of the diffusers model), then our code could be directly used.