Closed taurenshaman closed 7 years ago
RAM is not useful, you need lots of GPU memory for this. Those numbers look about right for 12GB of GPU memory.
The K80 is actually 2 GPUs, with 12GB of GPU memory per GPU.
Thanks for your reply. I searched and got https://www.tensorflow.org/tutorials/using_gpu There is some code in Using multiple GPUs part:
`
#Creates a graph.
c = []
for d in ['/gpu:2', '/gpu:3']:
with tf.device(d):
a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3])
b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2])
c.append(tf.matmul(a, b))
with tf.device('/cpu:0'):
sum = tf.add_n(c)
#Creates a session with log_device_placement set to True.
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
#Runs the op.
print(sess.run(sum))
`
Is there a automatic way to use all available GPUs?
Maybe there is two possible performance scenarios:
Then we can use gpus1 or gpus2 in automatical way, instead of calling the names ('/gpu:0', '/gpu:1', '/gpu:2') in hard code.
I searched an answer refered to code from line 170 (https://github.com/tensorflow/models/blob/master/tutorials/image/cifar10/cifar10_multi_gpu_train.py). Is it the most automatical way to use all GPUs?
My scenario is to process pictures created by mobile in original resolution ratio. I must prepare for 4K pictures. So I must test out the best performance of NC series:
If ignore image resolution ratio, just use gpu_id = iterations % gpu_num will be the simplest way. But as I mentioned in the issue (testing log), if the image has a large resolution ratio, OOM will occur in iteration 1.
Can you give me some advice?
Thank you very much!
It should be possible to split the model between two GPUs, though performance is probably not going to be great because you'll need data transfer between the parts on each forward and backward pass.
It would be cool if you're interested in implementing and benchmarking multi-GPU support.
On Jul 6, 2017, at 12:05 AM, Jerin notifications@github.com wrote:
Thanks for your reply. I searched and got https://www.tensorflow.org/tutorials/using_gpu There is some code in Using multiple GPUs part: `# Creates a graph. c = [] for d in ['/gpu:2', '/gpu:3']: with tf.device(d): a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3]) b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2]) c.append(tf.matmul(a, b)) with tf.device('/cpu:0'): sum = tf.add_n(c)
Creates a session with log_device_placement set to True.
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
Runs the op.
print(sess.run(sum))`
Is there a automatic way to use all available GPUs? Maybe there is two possible performance scenarios:
gpus1 = ['/gpu:0', '/gpu:1', '/gpu:2'] # all GPUs have a same performance gpus2 = ['/gpu:0', '/gpu:1', '/gpu:2'] # GPU performance: gpu0 > gpu1 > gpu2 Then we can use gpus1 or gpus2 in automatical way, instead of calling the names ('/gpu:0', '/gpu:1', '/gpu:2') in hard code. I searched an answer refered to code from line 170 (https://github.com/tensorflow/models/blob/master/tutorials/image/cifar10/cifar10_multi_gpu_train.py). Is it the most automatical way to use all GPUs?
Thank you!
— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.
So there will be no improvement between 1 K80 and 2 K80, if no code updates. -_-||| Thank you very much!
Yup - I don't have the bandwidth to work on this right now, unfortunately.
I'm new in deep learning. I just wanna test some ideas, so I played the code on Azure VM NC6 successfully (NC6 is like a Instamatic to me ^_^). But I got some odd log.
Before the log, I should show the feature of NC6(GPU part: https://azure.microsoft.com/en-us/pricing/details/virtual-machines/linux/):
NC series:NVIDIA k80 GPU. Double GPU,4992 CUDA,24GB,double:2.91TFLOPS,flout:8.73TFLOPS.
NC6:6 cores + 56GiB memory + 340GiB disk + 1X K80. $0.9/hour.
Test image 1: 300x369, less than 1 second one iteration.
Test image 2: 2960x5258, OOM in iteration 1.
Then I zoom it to 1480x2629, OOM in iteration 1.
Again I zoom it to 740x1315, worked, less than 3 seconds one iteration.
All of the above has the same log part: Total memory: 11.17GiB.
Thank you very much!