SUDO-AI-3D / zero123plus

Code repository for Zero123++: a Single Image to Consistent Multi-view Diffusion Base Model.
Apache License 2.0
1.56k stars 108 forks source link

CUDA out of memory. Tried to allocate error #82

Closed daggs1 closed 2 weeks ago

daggs1 commented 1 month ago

Greetings,

I'm trying to run gardio_app demo like stated in the readme and I'm getting this error:

$ python gradio_app.py /home/worker/zero123plus/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: resume_download is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use force_download=True. warnings.warn( text_encoder/model.safetensors not found Loading pipeline components...: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:03<00:00, 2.43it/s] Traceback (most recent call last): File "/home/worker/zero123plus/gradio_app.py", line 204, in fire.Fire(run_demo) File "/home/worker/zero123plus/lib/python3.10/site-packages/fire/core.py", line 143, in Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) File "/home/worker/zero123plus/lib/python3.10/site-packages/fire/core.py", line 477, in _Fire component, remaining_args = _CallAndUpdateTrace( File "/home/worker/zero123plus/lib/python3.10/site-packages/fire/core.py", line 693, in _CallAndUpdateTrace component = fn(*varargs, *kwargs) File "/home/worker/zero123plus/gradio_app.py", line 137, in run_demo pipeline.to(f'cuda:{_GPU_ID}') File "/home/worker/zero123plus/lib/python3.10/site-packages/diffusers/pipelines/pipeline_utils.py", line 727, in to module.to(torch_device, torch_dtype) File "/home/worker/zero123plus/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1878, in to return super().to(args, **kwargs) File "/home/worker/zero123plus/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1173, in to return self._apply(convert) File "/home/worker/zero123plus/lib/python3.10/site-packages/torch/nn/modules/module.py", line 779, in _apply module._apply(fn) File "/home/worker/zero123plus/lib/python3.10/site-packages/torch/nn/modules/module.py", line 779, in _apply module._apply(fn) File "/home/worker/zero123plus/lib/python3.10/site-packages/torch/nn/modules/module.py", line 779, in _apply module._apply(fn) [Previous line repeated 3 more times] File "/home/worker/zero123plus/lib/python3.10/site-packages/torch/nn/modules/module.py", line 804, in _apply param_applied = fn(param) File "/home/worker/zero123plus/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1159, in convert return t.to( torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB. GPU

any idea what is wrong? I've ran the setup like stated in the readme file

daggs1 commented 1 month ago

I understand now, your small example requires 5GB of vram, my gpu has only 4GB of vram, shame, is there any way to reduce memory consumption?

eliphatfs commented 1 month ago

Nowadays we do have more techniques to reduce inference-time memory, including model offloading, autotune compile, quantization and maybe others. We do not have the code for these ready now but you are welcome to contribute. The first one does not incur any time overhead usually, while compiling will take some time before the start. Quantization would need some extra code and more tuning.