Question about the inference process

Thank you for the cool job! After reading the paper and reproducing the result, I have a question regarding the inference part.

The inference of quantized model should be based on the quantized model, why should we load the FP32 model first? Take txt2img.py for example, why should we load the original FP32 model, i.e. sd-v1-4.ckpt , then load the quantized model, i.e. sd_w8a8_ckpt.pth to run inference?