Seems unable to utilize multiple GPUs

DeepGraphLearning / NBFNet

Official implementation of Neural Bellman-Ford Networks (NeurIPS 2021)

MIT License

197 stars 29 forks source link

Hi there.

I have tried running this code on one of my machine with four RTX3090 GPUs (GPU memory 24GB for each)

python -m torch.distributed.launch --nproc_per_node=4 script/run.py -c config/inductive/wn18rr.yaml --gpus [0,1,2,3]

I do not change any other parts of this repo. However, I encountered the CUDA error saying that I need more GPU memory. Later I modified this code as follows:

python script/run.py -c config/inductive/wn18rr.yaml --gpus [0]

and run it on a machine with one A100 GPU with 40GB GPU memory. The code runs successfully and costs roughly 32GB GPU memory. I am really puzzled for this: why the code does not properly utilize the total 24GB*4=96GB GPU memory and still report a memory issue? Is there something wrong with my setups?

DeepGraphLearning / NBFNet

Seems unable to utilize multiple GPUs #11