Open felixslu opened 11 months ago
32G V100 also gets OOM
40G A100 also OOM. Likely because of the fake quantization operations which require their own intermediate tensors allocated. You can reduce "n_samples" to counter this. For example n_samples=1 only needs 20GB.
For me, it's not effective. Even with n_samples=1 on 32GB V100, it still leads to OOM. This is the launch script: python scripts/txt2img.py --prompt "a puppet wearing a hat" --plms --cond --ptq --weight_bit 4 --quant_mode qdiff --no_grad_ckpt --split --n_samples 1 --quant_act --act_bit 8 --sm_abit 16 --outdir ./data/ --cali_ckpt models/sd_w4a8.pth --resume
For me, it's not effective. Even with n_samples=1 on 32GB V100, it still leads to OOM. This is the launch script: python scripts/txt2img.py --prompt "a puppet wearing a hat" --plms --cond --ptq --weight_bit 4 --quant_mode qdiff --no_grad_ckpt --split --n_samples 1 --quant_act --act_bit 8 --sm_abit 16 --outdir ./data/ --cali_ckpt models/sd_w4a8.pth --resume
I also encountered this problem. May I know how you solve it finally
1、Questions
As we Known, SD v1.5 has 1 Billions params , and it's peek GPU memory is about 4G at the precison fp32. So, the memory of int4 precison (sd_w4a8_chpt.pth) will be about 4G/8 = 500MB. However, when I load and run your w4a8 quantization models , the consumed GPU memory is more than 24GB, and we got a OOM finnaly!
2、my commands:
python txt2img.py --prompt "a puppet wearing a hat" --plms --cond --ptq --weight_bit 4 --quant_mode qdiff --no_grad_ckpt --split --n_samples 5 --quant_act --act_bit 8 --sm_abit 16 --outdir ./data/ --cali_ckpt ../sd_w4a8_ckpt-001.pth
3、Error Logs:
07/31/2023 11:16:03 - INFO - root - Loading model from models/ldm/stable-diffusion-v1/model.ckpt 07/31/2023 11:16:04 - INFO - root - Global Step: 470000 07/31/2023 11:16:04 - INFO - torch.distributed.nn.jit.instantiator - Created a temporary directory at /tmp/tmpmwfx988m 07/31/2023 11:16:04 - INFO - torch.distributed.nn.jit.instantiator - Writing /tmp/tmpmwfx988m/_remote_module_non_scriptable.py LatentDiffusion: Running in eps-prediction mode 07/31/2023 11:16:07 - INFO - ldm.util - DiffusionWrapper has 859.52 M params. 07/31/2023 11:16:07 - INFO - ldm.modules.diffusionmodules.model - making attention of type 'vanilla' with 512 in_channels 07/31/2023 11:16:07 - INFO - ldm.modules.diffusionmodules.model - Working with z of shape (1, 4, 32, 32) = 4096 dimensions. 07/31/2023 11:16:07 - INFO - ldm.modules.diffusionmodules.model - making attention of type 'vanilla' with 512 in_channels 07/31/2023 11:16:12 - INFO - main - Not use gradient checkpointing for transformer blocks Loading quantized model checkpoint Initializing weight quantization parameters 07/31/2023 11:16:27 - INFO - qdiff.quant_layer - split at 1280! 07/31/2023 11:16:28 - INFO - qdiff.quant_layer - split at 1280! 07/31/2023 11:16:28 - INFO - qdiff.quant_layer - split at 1280! 07/31/2023 11:16:29 - INFO - qdiff.quant_layer - split at 1280! 07/31/2023 11:16:32 - INFO - qdiff.quant_layer - split at 1280! 07/31/2023 11:16:34 - INFO - qdiff.quant_layer - split at 1280! 07/31/2023 11:16:37 - INFO - qdiff.quant_layer - split at 1280! 07/31/2023 11:16:38 - INFO - qdiff.quant_layer - split at 640! 07/31/2023 11:16:39 - INFO - qdiff.quant_layer - split at 640! 07/31/2023 11:16:40 - INFO - qdiff.quant_layer - split at 640! 07/31/2023 11:16:41 - INFO - qdiff.quant_layer - split at 320! 07/31/2023 11:16:42 - INFO - qdiff.quant_layer - split at 320! Initializing act quantization parameters Traceback (most recent call last): File "txt2img.py", line 444, in
main()
File "txt2img.py", line 340, in main
resume_cali_model(qnn, opt.cali_ckpt, cali_data, opt.quant_act, "qdiff", cond=opt.cond)
File "/home/xx/car/bigmodel/q-diffusion/qdiff/utils.py", line 86, in resume_calimodel
= qnn(cali_xs.cuda(), cali_ts.cuda(), cali_cs.cuda())
...
...
File "/root/miniconda3/envs/qdiff/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, *kwargs)
File "/home/xx/car/bigmodel/q-diffusion/qdiff/adaptive_rounding.py", line 59, in forward
x_float_q = (x_quant - self.zero_point) self.delta
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 50.00 MiB (GPU 0; 23.69 GiB total capacity; 23.21 GiB already allocated; 11.69 MiB free; 23.30 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF