Using multiple GPUs for inference

TRI-ML / prismatic-vlms

A flexible and efficient codebase for training visually-conditioned language models (VLMs)

MIT License

407 stars 171 forks source link

Hi I am aware that you have a vlm-evaluation repo as well, but it seems to have a fixed set of datasets while I want to evaluate on my own datasets, could you advise how to do that on multiple GPUs using the scripts that you have given in the README?

I have tried accelerate config followed by accelerate launch --num_processes=4 infer.py, I have also done export CUDA_VISIBLE_DEVICES=0,1,2,3 but get the following error:

torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
Root Cause (first observed failure):
[0]:
  time      : 2024-07-13_21:21:29
  host      : xxx
  rank      : 3 (local_rank: 3)
  exitcode  : -9 (pid: xxx)
  error_file: <N/A>
  traceback : Signal 9 (SIGKILL) received by PID xxx

TRI-ML / prismatic-vlms

Using multiple GPUs for inference #43