ByungKwanLee / MoAI

[ECCV 2024] Official PyTorch implementation code for realizing the technical part of Mixture of All Intelligence (MoAI) to improve performance of numerous zero-shot vision language tasks.
MIT License
311 stars 32 forks source link

How many GPU RAMs to inference? #8

Open phuchm opened 8 months ago

phuchm commented 8 months ago

I used 16x2 GPU NVIDIA GeForce RTX 4080 to try to run demo.py but get an error message as below: torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB. GPU 0 has a total capacity of 15.70 GiB of which 2.62 MiB is free. Including non-PyTorch memory, this process has 15.69 GiB memory in use. Of the allocated memory 15.42 GiB is allocated by PyTorch, and 4.74 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

ByungKwanLee commented 8 months ago

To my experience, GPU with 40GB VRAM is needed to run, where uploading 4 external cv models and MoAI needs about 20GB and then propagation needs another 10GB

phuchm commented 8 months ago

But as I understand, your code already used 4-bit loading, it means that we need at least 100 GB VRAM without 4-bit loading?

ByungKwanLee commented 8 months ago

To my experience, the memory between nearly 30GB and 40GB was occupied without no compression.

For 4-bit inference, the memory between nearly 20GB ~ 30GB was occupied.

I think memory reduction (from torch.cuda.empty_cache()) is worked between the time uploading all models and right before propagation

phuchm commented 8 months ago

@ByungKwanLee thank you so much for your explanation!

Jizhongpeng commented 8 months ago

How to define the device_map for Inference on Multiple GPUs? Thanks.

ByungKwanLee commented 8 months ago

You should add extra modules with Pytorch DDP or Accelereate in Huggingface!