CHTC / templates-GPUs

Template job submissions using GPUs in CHTC
MIT License
38 stars 11 forks source link

Investigate timing in multi-GPU example #25

Open agitter opened 1 year ago

agitter commented 1 year ago

24 adds a multi-GPU PyTorch example that demonstrates how to use Distributed Data Parallel training. However, training with multiple GPUs does not speed up training in the example. See https://github.com/CHTC/templates-GPUs/pull/24#issuecomment-1249509118

It would be worthwhile to monitor the training more closely, for instance the GPU utilization, to understand why this is the case.

agitter commented 1 year ago

Additional testing described in https://github.com/CHTC/templates-GPUs/pull/24#issuecomment-1272018718 shows the GPU utilization is high for two different types of GPUs. The lack of speedup could be related to the relatively small convolutional neural network model.