Tried to allocate 20.00 MiB (GPU 0; 14.76 GiB total capacity; 13.90 GiB already allocated; 14.75 MiB free; 14.14 GiB reserved in total by PyTorch

ZichengDuan / TheChosenOne

Unofficial implementation of the paper "The Chosen One: Consistent Characters in Text-to-Image Diffusion Models"

227 stars 20 forks source link

Tried to allocate 20.00 MiB (GPU 0; 14.76 GiB total capacity; 13.90 GiB already allocated; 14.75 MiB free; 14.14 GiB reserved in total by PyTorch #18

Open paratechnical opened 6 months ago

paratechnical commented 6 months ago

I keep getting out of memory exceptions no matter how I try to set PYTORCH_CUDA_ALLOC_CONF This is the error: File "/opt/saturncloud/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1143, in convert return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking) torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 14.76 GiB total capacity; 13.90 GiB already allocated; 14.75 MiB free; 14.14 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

JahnKhan commented 6 months ago

I got the same message. Could you fix this?

GraemeHarris commented 6 months ago

@paratechnical @JahnKhan I've had some success using the following https://huggingface.co/docs/diffusers/optimization/memory#memoryefficient-attention, the pipeline is instantiated in the load_trained_pipeline function, where you should be able to try reduce the memory usage as per the hugging face article.

Because I was still low on VRAM I went to the pipe.enable_sequential_cpu_offload() option, which is much slower but working :). I haven't tried model offloading yet, but might be something to try to keep some speed.

paratechnical commented 6 months ago

@GraemeHarris

if model_path is not None:
        # TODO: long warning for lora
        pipe = DiffusionPipeline.from_pretrained(model_path, torch_dtype=torch.float16)
        if load_lora:
            pipe.load_lora_weights(lora_path)
    else:
        pipe = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0")
    pipe.to("cuda")
    pipe.enable_xformers_memory_efficient_attention()
    pipe.enable_sequential_cpu_offload()

tried it like this and I have the same problem

What kind of GPU configuration are you using?