Closed chuxing closed 11 months ago
Could you please provide more information about your environment? We've tested the huggingface version of Emu2-Gen on A800-80G GPU. In bfloat16 precision, it occupies 77GB of GPU memory and can run all the examples in README.md successfully.
EmuVisualGenerationPipeline
inherited from diffusers.DiffusionPipeline
, you can use any multi-gpu techniques supported by diffusers or accelerate.cuda: nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2022 NVIDIA Corporation Built on Wed_Sep_21_10:33:58_PDT_2022 Cuda compilation tools, release 11.8, V11.8.89 Build cuda_11.8.r11.8/compiler.31833905_0
torch: Version: 2.1.2+cu118
nvidia: GPU 0: NVIDIA A100 80GB PCIe GPU 1: NVIDIA A100 80GB PCIe GPU 2: NVIDIA A100 80GB PCIe GPU 3: NVIDIA A100 80GB PCIe
The model can be load success, when finish from_pretrained(emu2_gen) in bfloat16 precision, it occupies 78GB of GPU memory
Then, run the generate code: 1) process text2image : pipe("impressionist painting of an astronaut in a jungle"), it work fine. but 2) process image editing: image = Image.open("tmp_file.png").convert('RGB').resize((256,256)) prompt = [image, "wearing a red hat on the beach."] ret = pipe(prompt) The model will OOM:
OutOfMemoryError: CUDA out of memory. Tried to allocate 512.00 MiB. GPU 0 has a total capacty of 79.10 GiB of which 505.94 MiB is free. Process 2861798 has 78.60 GiB memory in use. Of the allocated memory 75.77 GiB is allocated by PyTorch, and 1.11 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
the last track is : File /opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py:1527, in Module._call_impl(self, *args, *kwargs) 1522 # If we don't have any hooks, we want to skip the rest of the logic in 1523 # this function, and just call forward. 1524 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks 1525 or _global_backward_pre_hooks or _global_backward_hooks 1526 or _global_forward_hooks or _global_forward_pre_hooks): -> 1527 return forward_call(args, **kwargs) 1529 try: 1530 result = None
File ~/.local/lib/python3.10/site-packages/diffusers/models/resnet.py:198, in Upsample2D.forward(self, hidden_states, output_size, scale) 196 # If the input is bfloat16, we cast back to bfloat16 197 if dtype == torch.bfloat16: --> 198 hidden_states = hidden_states.to(dtype) 200 # TODO(Suraj, Patrick) - clean up after weight dicts are correctly renamed 201 if self.use_conv:
update: export PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:32 source ~/.bashrc
限制显存一次分配的最大单位可以解决这个问题. 目前能正常运行
Great! I'll close this issue. Feel free to open it again when encountering any other problems.
`pipe = DiffusionPipeline.from_pretrained( path, custom_pipeline="pipeline_emu2_gen", torch_dtype=torch.bfloat16, use_safetensors=True, variant="bf16", low_cpu_mem_usage=True ) pipe.to("cuda") print(pipe) prompt = "impressionist painting of an astronaut in a jungle" ret = pipe(prompt)
prompt = [image, "wearing a red hat on the head."] ret = pipe(prompt) ` OutOfMemoryError: CUDA out of memory. Tried to allocate 512.00 MiB A100 80G Memorty is OOM can be load in multi gpu ? or cpu offload