Bad performance Cuda out of memory - RTX4070 12G

jastranlove2020 commented 1 month ago

torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 46.00 MiB. GPU 0 has a total capacity of 11.99 GiB of which 0 bytes is free. Of the allocated memory 10.93 GiB is allocated by PyTorch, and 280.79 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

Just an image 1024x580 as low resolution but unable to complete the 1 image.

The model is not optimized well to run locally.

Consume too many resource and long time to wait with high performance RTX, but failed

Hope model will be improved as soon as possible

duracell80 commented 1 month ago

If using the bellow example

Try sequential CPU offload instead and lower number of steps. My 6GB VRAM card works fine.

pipe.enable_sequential_cpu_offload()

https://huggingface.co/docs/diffusers/en/optimization/memory

from diffusers import FluxPipeline

pipe = FluxPipeline.from_pretrained("black-forest-labs/FLUX.1-dev", torch_dtype=torch.bfloat16)
pipe.enable_model_cpu_offload()

prompt = "a tiny astronaut hatching from an egg on the moon"
out = pipe(
    prompt=prompt,
    guidance_scale=1.5,
    height=768,
    width=1360,
    num_inference_steps=5,
).images[0]
out.save("image.png")

Model offloading requires Accelerate version 0.17.0 or higher. I had to do a lot of requirements research and the below video helped a lot. 13:59.

https://www.youtube.com/watch?v=EvdgI_JLVcQ

My pip freeze

accelerate==0.33.0
certifi==2024.7.4
charset-normalizer==3.3.2
diffusers @ git+https://github.com/huggingface/diffusers.git@2d753b6fb53a24ffe4e833bd5c29036a36bf091d
filelock==3.15.4
fsspec==2024.6.1
huggingface-hub==0.24.5
idna==3.7
importlib_metadata==8.2.0
Jinja2==3.1.4
MarkupSafe==2.1.5
mpmath==1.3.0
networkx==3.3
numpy==1.26.4
nvidia-cublas-cu12==12.1.3.1
nvidia-cuda-cupti-cu12==12.1.105
nvidia-cuda-nvrtc-cu12==12.1.105
nvidia-cuda-runtime-cu12==12.1.105
nvidia-cudnn-cu12==8.9.2.26
nvidia-cufft-cu12==11.0.2.54
nvidia-curand-cu12==10.3.2.106
nvidia-cusolver-cu12==11.4.5.107
nvidia-cusparse-cu12==12.1.0.106
nvidia-nccl-cu12==2.19.3
nvidia-nvjitlink-cu12==12.6.20
nvidia-nvtx-cu12==12.1.105
ollama==0.3.1
packaging==24.1
pillow==10.4.0
protobuf==5.27.3
psutil==6.0.0
PyYAML==6.0.2
regex==2024.7.24
requests==2.32.3
safetensors==0.4.4
sentencepiece==0.2.0
sympy==1.13.1
tokenizers==0.19.1
torch==2.2.0
tqdm==4.66.5
transformers==4.44.0
triton==2.2.0
typing_extensions==4.12.2
urllib3==2.2.2
zipp==3.19.2

Solenyalyl commented 4 weeks ago

it might be the nvidia version and cuda version problem , you can try other proper nvidia version and cuda version

black-forest-labs / flux

Bad performance Cuda out of memory - RTX4070 12G #70