LostXine / LLaRA

LLaRA: Large Language and Robotics Assistant
Apache License 2.0
137 stars 3 forks source link

Encountering errors when running llara on multiple GPUs #7

Open erjiaxiao opened 2 weeks ago

erjiaxiao commented 2 weeks ago

Hello @LostXine, when running llara on multiple GPUs, I encountered the following error:

Exception has occurred: RuntimeError
CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
  File "/home/lsf_storage/homes/ch/claude/LLaRA/train-llava/llava/model/language_model/llava_llama.py", line 92, in forward
    return super().forward(
  File "/home/ch/claude/LLaRA/eval/llara_adv_attack.py", line 235, in model_generation
    outputs = model(**inputs)
  File "/home/ch/claude/LLaRA/eval/llara_adv_attack.py", line 526, in query_bc
    ans, _ , i = model_generation(tokenizer, model, image_processor, image_list, prepared_prompt)
  File "/home/ch/claude/LLaRA/eval/llara_adv_attack.py", line 436, in eval_episode
    paresed_action, prepared_prompt, ans, image = gen_action(tokenizer, model, image_processor,
  File "/home/ch/claude/LLaRA/eval/llara_adv_attack.py", line 567, in <module>
    eval_episode(args, query_bc, parse_bc)
RuntimeError: CUDA error: device-side assert triggered

However, everything works fine when I run llara on a single GPU. Are there any specific configurations required for multiple GPU usage?

LostXine commented 2 weeks ago

Hi @erjiaxiao ,

I did not try to run llara on multiple gpus for inference. The error log hints at some compatibility issues or hardware config issues but I'm not 100% sure. Could you confirm you are using the same version of some important packages (i.e. torch, cuda,..) as llava? I would like to test multiple GPU inference as well but unfortunately, I'm on travel now. I will try to get back to you before next weekend. Thank you for your understanding.

Best,

erjiaxiao commented 2 weeks ago

OK, Thank you! I will take a look at the problem. Have a good trip!