How to split batch between multiple GPUs for prediction.

ttocs167 commented 5 years ago

Feature Request / Question

Is it possible to use the predict.py function with a batch of images separated across two GPUs or more?

I have modified the predict.py code to accept a batch of images similar to training, that's the simple part, but now I want to know how to split this batch between multiple GPUs to speed up inference.

I know that you can nest code in "with tf.device()" blocks to force those tasks to run on specific devices, however this only seems to take effect if the operations themselves (that go into sess.run)" are defined within those blocks. For prediction however the only operation defined is the network from the model_builder.py script and this doesn't work because you get an error for defining two graphs with the same variable names.

I was hoping to do something like so:

...
for i, d in enumerate(['/gpu:0', '/gpu:1']):
    with tf.device(d):
        output = sess.run(network, feed_dict={net_input: input_image_batch[i]}
...

However this has no effect at all and only the first GPU is used as if the block wasn't there at all. I assumed this would work since in the train.py file has the image augmentation block nested within a "with tf.device('/cpu:0')" block. Does this line take effect here?

I'm wondering if its possible to do prediction on multiple GPUs without heavily modifying the base code with all sorts of "with tf.device" blocks. Im aware you can use this to split specific parts of the network between devices, however I simply want a copy of the network predicting half of the batch on each GPU.

developpeur3d commented 5 years ago

I have the same question, did you find any solution.

ttocs167 commented 5 years ago

I have the same question, did you find any solution.

Unfortunately not yet. The closest I have gotten is finding a keras function that only works on keras models, tf.keras.utils.multi_gpu_model(). sadly this doesn't work for the models from this repo as they aren't built using keras layers. It may be possible to create two graphs and force them to run on a GPU each but I haven't had any luck yet.

Please let me know if you get anywhree with this problem too.

ga25 commented 4 years ago

I have resulted it:

for i in range(n_splited): with tf.device('/%s:%d' % (types_dev[0], i)): with tf.namescope('tower%d' % i) as scope: print('TOWER %d' % i) with slim.arg_scope([slim.variable], device='/%s:0' % types_dev[-1]): t = tower_fn(**paras)

The function tower_fn is defined as follow: def tower_fn(kwargs): """ Model tower to be run on each GPU or CPU. :param kwargs: :return: net_outputs, init_fn(if exist), model, loss(if exist) """ with tf.variable_scope(tf.get_variable_scope(), reuse=tf.AUTO_REUSE): outs_fn = on_single_device_build_model(kwargs) return outs_fn

GeorgeSeif / Semantic-Segmentation-Suite

How to split batch between multiple GPUs for prediction. #194