multi GPU training - Githubissues

wenting-zhao commented 4 years ago

Hi there,

When I train with 5 GPUs (i.e., CUDA_VISIBLE_DEVICES=0,1,2,3,4 python main.py -dataset nuswide_vector -batch_size 32 -d_model 512 -d_inner_hid 512 -n_layers_enc 2 -n_layers_dec 2 -n_head 4 -epoch 50 -dropout 0.2 -dec_dropout 0.2 -lr 0.0002 -encoder 'mlp' -decoder 'graph' -label_mask 'prior), I got ``(Training) elapse: 11.401 min''.

However, when I train with 1 GPU (i.e., CUDA_VISIBLE_DEVICES=0 python main.py -dataset nuswide_vector -batch_size 32 -d_model 512 -d_inner_hid 512 -n_layers_enc 2 -n_layers_dec 2 -n_head 4 -epoch 50 -dropout 0.2 -dec_dropout 0.2 -lr 0.0002 -encoder 'mlp' -decoder 'graph' -label_mask 'prior), I got (Training) elapse: 1.766 min.

I was wondering what might happen there? Have you run into something similar before? Thanks in advance!

jacklanchantin commented 4 years ago

Hi, can you verify that nn.DataParallel and cuda are working on the GPU? I.e. are lines 106-115 in main.py executing? https://github.com/QData/LaMP/blob/master/main.py#L106-L115

wenting-zhao commented 4 years ago

Yes! I can verify they are running those lines. So it prints Using 5 GPUs!

I also ran nvidia-smi and here is the result

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.64.00    Driver Version: 440.64.00    CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla V100-PCIE...  Off  | 00000000:18:00.0 Off |                    0 |
| N/A   28C    P0    42W / 250W |   1777MiB / 16160MiB |     22%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla V100-PCIE...  Off  | 00000000:3B:00.0 Off |                    0 |
| N/A   28C    P0    48W / 250W |   1471MiB / 16160MiB |     18%      Default |
+-------------------------------+----------------------+----------------------+
|   2  Tesla V100-PCIE...  Off  | 00000000:5E:00.0 Off |                    0 |
| N/A   28C    P0    41W / 250W |   1471MiB / 16160MiB |     17%      Default |
+-------------------------------+----------------------+----------------------+
|   3  Tesla V100-PCIE...  Off  | 00000000:86:00.0 Off |                    0 |
| N/A   29C    P0    38W / 250W |   1471MiB / 16160MiB |     19%      Default |
+-------------------------------+----------------------+----------------------+
|   4  Tesla V100-PCIE...  Off  | 00000000:D8:00.0 Off |                    0 |
| N/A   29C    P0    37W / 250W |   1399MiB / 16160MiB |     18%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      6804      C   python                                      1765MiB |
|    1      6804      C   python                                      1459MiB |
|    2      6804      C   python                                      1459MiB |
|    3      6804      C   python                                      1459MiB |
|    4      6804      C   python                                      1387MiB |
+-----------------------------------------------------------------------------+

QData / LaMP

multi GPU training #6