Open ttocs167 opened 5 years ago
I have the same question, did you find any solution.
I have the same question, did you find any solution.
Unfortunately not yet. The closest I have gotten is finding a keras function that only works on keras models, tf.keras.utils.multi_gpu_model(). sadly this doesn't work for the models from this repo as they aren't built using keras layers. It may be possible to create two graphs and force them to run on a GPU each but I haven't had any luck yet.
Please let me know if you get anywhree with this problem too.
I have resulted it:
for i in range(n_splited): with tf.device('/%s:%d' % (types_dev[0], i)): with tf.namescope('tower%d' % i) as scope: print('TOWER %d' % i) with slim.arg_scope([slim.variable], device='/%s:0' % types_dev[-1]): t = tower_fn(**paras)
The function tower_fn is defined as follow: def tower_fn(kwargs): """ Model tower to be run on each GPU or CPU. :param kwargs: :return: net_outputs, init_fn(if exist), model, loss(if exist) """ with tf.variable_scope(tf.get_variable_scope(), reuse=tf.AUTO_REUSE): outs_fn = on_single_device_build_model(kwargs) return outs_fn
Feature Request / Question
Is it possible to use the
predict.py
function with a batch of images separated across two GPUs or more?I have modified the
predict.py
code to accept a batch of images similar to training, that's the simple part, but now I want to know how to split this batch between multiple GPUs to speed up inference.I know that you can nest code in "with tf.device()" blocks to force those tasks to run on specific devices, however this only seems to take effect if the operations themselves (that go into sess.run)" are defined within those blocks. For prediction however the only operation defined is the network from the
model_builder.py
script and this doesn't work because you get an error for defining two graphs with the same variable names.I was hoping to do something like so:
However this has no effect at all and only the first GPU is used as if the block wasn't there at all. I assumed this would work since in the
train.py
file has the image augmentation block nested within a "with tf.device('/cpu:0')" block. Does this line take effect here?I'm wondering if its possible to do prediction on multiple GPUs without heavily modifying the base code with all sorts of "with tf.device" blocks. Im aware you can use this to split specific parts of the network between devices, however I simply want a copy of the network predicting half of the batch on each GPU.