Closed felixslu closed 6 months ago
1、Question on My Nvidia Geforce RTX 3090,I run your w4a8 sd-v1.4 models(sd_w4a8_ckpt-001.pth), got a inference speed of 6 it/s. However, when I run a fp16 sd-v1.5 models by pytorch2.0.1, the inference speed is 20 it/s.
2、My running commands python txt2img.py --prompt "elon musk wearing a suit" --plms --cond --ptq --weight_bit 4 --quant_mode qdiff --no_grad_ckpt --split --n_samples 1 --outdir ./data/ --cali_ckpt ../sd_w4a8_ckpt-001.pth
3、My profile logs
sd_w4a8_ckpt-001.pth
fp16 sd-v1.5 by pytorch2.0.1
quantization not speed up mostly just loose mem usage imo.,
Could you share your code of inferencing in fp16? I would like to do some experiments based on it.
1、Question on My Nvidia Geforce RTX 3090,I run your w4a8 sd-v1.4 models(sd_w4a8_ckpt-001.pth), got a inference speed of 6 it/s. However, when I run a fp16 sd-v1.5 models by pytorch2.0.1, the inference speed is 20 it/s.
2、My running commands python txt2img.py --prompt "elon musk wearing a suit" --plms --cond --ptq --weight_bit 4 --quant_mode qdiff --no_grad_ckpt --split --n_samples 1 --outdir ./data/ --cali_ckpt ../sd_w4a8_ckpt-001.pth
3、My profile logs
sd_w4a8_ckpt-001.pth
fp16 sd-v1.5 by pytorch2.0.1