Closed csguoh closed 4 months ago
Hi,
For the first question, we choose the first linear/conv layer, e.g., time_embed
with another conv, to be full-precision as described in the experimental setting.
To answer the second question, we implement the algorithm here: https://github.com/ModelTC/TFMQ-DM/blob/1068e8aa0aa477a59a8afd01a99a906b0f88a8f2/quant/calibration.py#L112 We employ a time-aware scale factor for the whole model in this version since this type of calibration with min-max can help accelerate the quantization process. This incurs negligible impact on performance compared to only applying that to our Temporal Information Block with LSQ to other components.
Hi,
Thanks for your reply. I have found the corresponding code for timestep-specific scale with your hint.
However, as for the ckpt-saving, I cannot figure it out how to use the timestep specific act quant. I have noticed this line is used to store the time-aware scale, but in the load_cali_model
function, I cannot find the corresponding load code. Can you give me some hints about where to use these timestep-aware scale factors during inference?
Thx.
Hi, Thanks for this work! In the paper, the Finite set calibration is used which uses a timestep-aware scale factor to quant the activation in the
embedding_layers
andtime_embed
. I tried to run the calibration of txt2img.py, and have two questions.It seems that the
disable_out_quantizaion
function have set theuse_aq
of thetime_embed
to False, therefore thetime_embed
layer is actually not used for act-quant. So I am confused since the paper states thetime_embed
is also act-quantized.Where can I find the code to implement the timestep-aware scale factor for act quantization, I have searched the whole project but still can not find timesetp-spectific scale
s
for act-quant. May I get your help please?Thx.