Training is slow and not using GPU

divelab / DIG

A library for graph deep learning research

https://diveintographs.readthedocs.io/

GNU General Public License v3.0

1.82k stars 281 forks source link

Training is slow and not using GPU #176

Open davidfstein opened 1 year ago

davidfstein commented 1 year ago

I'm attempting to run the GraphCL example with a custom dataset (n~=150,000). I am passing device='cuda' and my GPU is available, but the GPU utilization is at 0% and the evaluate training loop is expected to run for ~12 hours. Is there a way to increase GPU utilization and do you expect the implementation to scale to larger datasets?

ycremar commented 1 year ago

Hi @davidfstein ,

Thank you for letting us know about the issue. This is not our expected performance. Could you try setting log_interval to be equal to your total number of epochs and see if the GPU utilization increases? Also, could you confirm if the GPU memory is used?

We will continue working on efficiency optimization.

davidfstein commented 1 year ago

Hi, I added a data.to(device) in the encoder training loop and now the models are using the GPU. I will try to go back and take a look to see why that data isn't being moved to the GPU in the first place. I will try to update here later