Open jastranlove2020 opened 1 month ago
If using the bellow example
Try sequential CPU offload instead and lower number of steps. My 6GB VRAM card works fine.
pipe.enable_sequential_cpu_offload()
https://huggingface.co/docs/diffusers/en/optimization/memory
from diffusers import FluxPipeline
pipe = FluxPipeline.from_pretrained("black-forest-labs/FLUX.1-dev", torch_dtype=torch.bfloat16)
pipe.enable_model_cpu_offload()
prompt = "a tiny astronaut hatching from an egg on the moon"
out = pipe(
prompt=prompt,
guidance_scale=1.5,
height=768,
width=1360,
num_inference_steps=5,
).images[0]
out.save("image.png")
Model offloading requires Accelerate version 0.17.0 or higher. I had to do a lot of requirements research and the below video helped a lot. 13:59.
https://www.youtube.com/watch?v=EvdgI_JLVcQ
My pip freeze
accelerate==0.33.0
certifi==2024.7.4
charset-normalizer==3.3.2
diffusers @ git+https://github.com/huggingface/diffusers.git@2d753b6fb53a24ffe4e833bd5c29036a36bf091d
filelock==3.15.4
fsspec==2024.6.1
huggingface-hub==0.24.5
idna==3.7
importlib_metadata==8.2.0
Jinja2==3.1.4
MarkupSafe==2.1.5
mpmath==1.3.0
networkx==3.3
numpy==1.26.4
nvidia-cublas-cu12==12.1.3.1
nvidia-cuda-cupti-cu12==12.1.105
nvidia-cuda-nvrtc-cu12==12.1.105
nvidia-cuda-runtime-cu12==12.1.105
nvidia-cudnn-cu12==8.9.2.26
nvidia-cufft-cu12==11.0.2.54
nvidia-curand-cu12==10.3.2.106
nvidia-cusolver-cu12==11.4.5.107
nvidia-cusparse-cu12==12.1.0.106
nvidia-nccl-cu12==2.19.3
nvidia-nvjitlink-cu12==12.6.20
nvidia-nvtx-cu12==12.1.105
ollama==0.3.1
packaging==24.1
pillow==10.4.0
protobuf==5.27.3
psutil==6.0.0
PyYAML==6.0.2
regex==2024.7.24
requests==2.32.3
safetensors==0.4.4
sentencepiece==0.2.0
sympy==1.13.1
tokenizers==0.19.1
torch==2.2.0
tqdm==4.66.5
transformers==4.44.0
triton==2.2.0
typing_extensions==4.12.2
urllib3==2.2.2
zipp==3.19.2
it might be the nvidia version and cuda version problem , you can try other proper nvidia version and cuda version
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 46.00 MiB. GPU 0 has a total capacity of 11.99 GiB of which 0 bytes is free. Of the allocated memory 10.93 GiB is allocated by PyTorch, and 280.79 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
Just an image 1024x580 as low resolution but unable to complete the 1 image.
The model is not optimized well to run locally.
Consume too many resource and long time to wait with high performance RTX, but failed
Hope model will be improved as soon as possible