PLEASE HELP ME - OutOfMemoryError: CUDA out of memory. Tried to allocate 50.00 MiB. GPU

codeolder commented 4 months ago

Untitled Code.txt BasicTransformerBlock is using checkpointing Loaded model config from [options/SUPIR_v0.yaml] Loaded state_dict from [/opt/data/private/AIGC_pretrain/SDXL_cache/sd_xl_base_1.0_0.9vae.safetensors] Loaded state_dict from [/opt/data/private/AIGC_pretrain/SUPIR_cache/SUPIR-v0Q.ckpt] Loading vision tower: openai/clip-vit-large-patch14-336 Loading checkpoint shards: 67%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████ | 2/3 [00:23<00:11, 11.70s/it] ╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮ │ C:\Users\PC\Downloads\SUPIR\test.py:72 in │ │ │ │ 69 model = model.to(SUPIR_device) │ │ 70 # load LLaVA │ │ 71 if use_llava: │ │ ❱ 72 │ llava_agent = LLavaAgent(LLAVA_MODEL_PATH, device=LLaVA_device, load8bit=args.load │ │ 73 else: │ │ 74 │ llava_agent = None │ │ 75 │ │ │ │ C:\Users\PC\Downloads\SUPIR\llava\llava_agent.py:27 in init │ │ │ │ 24 │ │ │ device_map = 'auto' │ │ 25 │ │ model_path = os.path.expanduser(model_path) │ │ 26 │ │ model_name = get_model_name_from_path(model_path) │ │ ❱ 27 │ │ tokenizer, model, image_processor, context_len = load_pretrained_model( │ │ 28 │ │ │ model_path, None, model_name, device=self.device, device_map=device_map, │ │ 29 │ │ │ load_8bit=load_8bit, load_4bit=load_4bit) │ │ 30 │ │ self.model = model │ │ │ │ C:\Users\PC\Downloads\SUPIR\llava\model\builder.py:103 in load_pretrained_model │ │ │ │ 100 │ │ │ │ model = LlavaMPTForCausalLM.from_pretrained(model_path, low_cpu_mem_usag │ │ 101 │ │ │ else: │ │ 102 │ │ │ │ tokenizer = AutoTokenizer.from_pretrained(model_path, use_fast=False) │ │ ❱ 103 │ │ │ │ model = LlavaLlamaForCausalLM.from_pretrained(model_path, low_cpu_mem_us │ │ 104 │ else: │ │ 105 │ │ # Load language model │ │ 106 │ │ if model_base is not None: │ │ │ │ C:\Users\PC\AppData\Local\Programs\Python\Python310\lib\site-packages\transformers\modeling_util │ │ s.py:2795 in from_pretrained │ │ │ │ 2792 │ │ │ │ mismatched_keys, │ │ 2793 │ │ │ │ offload_index, │ │ 2794 │ │ │ │ error_msgs, │ │ ❱ 2795 │ │ │ ) = cls._load_pretrained_model( │ │ 2796 │ │ │ │ model, │ │ 2797 │ │ │ │ state_dict, │ │ 2798 │ │ │ │ loaded_state_dict_keys, # XXX: rename? │ │ │ │ C:\Users\PC\AppData\Local\Programs\Python\Python310\lib\site-packages\transformers\modeling_util │ │ s.py:3123 in _load_pretrained_model │ │ │ │ 3120 │ │ │ │ ) │ │ 3121 │ │ │ │ │ │ 3122 │ │ │ │ if low_cpu_mem_usage: │ │ ❱ 3123 │ │ │ │ │ new_error_msgs, offload_index, state_dict_index = _load_state_dict_i │ │ 3124 │ │ │ │ │ │ model_to_load, │ │ 3125 │ │ │ │ │ │ state_dict, │ │ 3126 │ │ │ │ │ │ loaded_keys, │ │ │ │ C:\Users\PC\AppData\Local\Programs\Python\Python310\lib\site-packages\transformers\modeling_util │ │ s.py:698 in _load_state_dict_into_meta_model │ │ │ │ 695 │ │ │ state_dict_index = offload_weight(param, param_name, state_dict_folder, stat │ │ 696 │ │ elif not load_in_8bit: │ │ 697 │ │ │ # For backward compatibility with older versions of accelerate │ │ ❱ 698 │ │ │ set_module_tensor_to_device(model, param_name, param_device, **set_module_kw │ │ 699 │ │ else: │ │ 700 │ │ │ if param.dtype == torch.int8 and param_name.replace("weight", "SCB") in stat │ │ 701 │ │ │ │ fp16_statistics = state_dict[param_name.replace("weight", "SCB")] │ │ │ │ C:\Users\PC\AppData\Local\Programs\Python\Python310\lib\site-packages\accelerate\utils\modeling. │ │ py:149 in set_module_tensor_to_device │ │ │ │ 146 │ │ if value is None: │ │ 147 │ │ │ new_value = old_value.to(device) │ │ 148 │ │ elif isinstance(value, torch.Tensor): │ │ ❱ 149 │ │ │ new_value = value.to(device) │ │ 150 │ │ else: │ │ 151 │ │ │ new_value = torch.tensor(value, device=device) │ │ 152 │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯ OutOfMemoryError: CUDA out of memory. Tried to allocate 50.00 MiB. GPU

FurkanGozukara commented 4 months ago

what GPU you have? we have auto installer and a version that works as low as 8 GB (FP8 + tiled VAE + cpu offloading)

codeolder commented 4 months ago

The GPU I use is RTX 4060, can please provide that auto installer I have installed this project many times and it has always failed

FurkanGozukara commented 4 months ago

The GPU I use is RTX 4060, can please provide that auto installer I have installed this project many times and it has always failed

here our video

it would work super on your GPU

https://youtu.be/OYxVEvDf284

codeolder commented 4 months ago

I tried registering an account and participating in the link in the video mentioned, can you send me the file or does this cost money?

yuanzhi-zhu commented 3 months ago

@codeolder on a 40g a100, I make it work by set load_4bit=True in test.py

oatsvine commented 2 months ago

Is there a baseline config that works with 24gb (using test.py or a sanely modified version)? Without reading the sources or paper, I don't have a sense of what adjustments to make to decrease memory usage.

Fanghua-Yu / SUPIR

PLEASE HELP ME - OutOfMemoryError: CUDA out of memory. Tried to allocate 50.00 MiB. GPU #113