TRI-ML / prismatic-vlms

A flexible and efficient codebase for training visually-conditioned language models (VLMs)
MIT License
407 stars 171 forks source link

Using multiple GPUs for inference #43

Open yunbinmo opened 2 months ago

yunbinmo commented 2 months ago

Hi,

I am trying to run inference with llama2+13b and I have 4 RTX3090 each with 24GB Memory, however I noticed that when I use the sample inference code, it only uses one GPU which causes out of memory error, any suggestion for this? (I have tried using accelerate but it didn't work, I guess the way I was using it was incorrect)

Thanks a lot!

yunbinmo commented 1 month ago

Hi I am aware that you have a vlm-evaluation repo as well, but it seems to have a fixed set of datasets while I want to evaluate on my own datasets, could you advise how to do that on multiple GPUs using the scripts that you have given in the README?

I have tried accelerate config followed by accelerate launch --num_processes=4 infer.py, I have also done export CUDA_VISIBLE_DEVICES=0,1,2,3 but get the following error:

torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
Root Cause (first observed failure):
[0]:
  time      : 2024-07-13_21:21:29
  host      : xxx
  rank      : 3 (local_rank: 3)
  exitcode  : -9 (pid: xxx)
  error_file: <N/A>
  traceback : Signal 9 (SIGKILL) received by PID xxx