Training or calling with multiple GPU

Hi @PengNi,

Couple questions about using more than one GPU to call modifications or train models. Do you know if it should increase the computing spead ? I try on multinode system with two Tesla V100, they are detected tensorflow but my support tells me only one of two seems to be used :

2020-10-13 14:18:46.789664: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): Tesla V100-SXM2-16GB, Compute Capability 7.0
2020-10-13 14:18:46.789716: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (1): Tesla V100-SXM2-16GB, Compute Capability 7.0
2020-10-13 14:18:46.793960: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2300000000 Hz
2020-10-13 14:18:46.794266: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x55fc012ae9f0 executing computations on platform Host. Devices:
2020-10-13 14:18:46.794335: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): <undefined>, <undefined>
2020-10-13 14:18:46.795411: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: 
name: Tesla V100-SXM2-16GB major: 7 minor: 0 memoryClockRate(GHz): 1.53
pciBusID: 0000:88:00.0
totalMemory: 15.75GiB freeMemory: 15.44GiB
2020-10-13 14:18:46.796375: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 1 with properties: 
name: Tesla V100-SXM2-16GB major: 7 minor: 0 memoryClockRate(GHz): 1.53
pciBusID: 0000:89:00.0
totalMemory: 15.75GiB freeMemory: 15.44GiB
2020-10-13 14:18:46.796490: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0, 1
2020-10-13 14:18:46.801670: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-10-13 14:18:46.801793: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      0 1 
2020-10-13 14:18:46.801846: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0:   N Y 
2020-10-13 14:18:46.801893: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 1:   Y N 
2020-10-13 14:18:46.803777: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 15023 MB memory) -> physical GPU (device: 0, name
: Tesla V100-SXM2-16GB, pci bus id: 0000:88:00.0, compute capability: 7.0)
2020-10-13 14:18:46.805440: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 15023 MB memory) -> physical GPU (device: 1, name
: Tesla V100-SXM2-16GB, pci bus id: 0000:89:00.0, compute capability: 7.0)
WARNING:tensorflow:From /users/t20015/terzian/.conda/envs/deepsignalenv-GPU/lib/python3.6/site-packages/tensorflow/python/training/saver.py:1266: checkpoint_exists (from tensorflow.python.training.checkpoint_man
agement) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to check for files with this prefix.
2020-10-13 14:19:24.920540: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally
call_mods process 31462 ending, proceed 676406 batches
finishing the write_process..
call_mods costs 122733.46 seconds..

Did you try using multiple GPU to call thousand of millions of features ? If yes can you share the time of computing for the number of feature?

Thank you,

Paul

bioinfomaticsCSU / deepsignal

Training or calling with multiple GPU #61