Hello, author, when I execute run_crag_inference.sh, the error shows that the single card graphics memory is insufficient, but the multi-card operation seems to be not set in the original code, right?
[rank0]: torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 268.00 MiB. GPU
Hi! generator = LLM(model=args.generator_path, dtype="half") utilized single GPU in our code, you can add the parameter tensor_parallel_size for multiple-GPU inference.
Hello, author, when I execute run_crag_inference.sh, the error shows that the single card graphics memory is insufficient, but the multi-card operation seems to be not set in the original code, right?
[rank0]: torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 268.00 MiB. GPU