Multi-GPU initialisation issue

I am trying to use multiple GPUs to train my network and I observed a peculiar behavior that I think is due to an initialization issue.

My network and application run just fine on a single GPU, but as soon as I want to use multiple GPUs I start getting NaNs all over the network and loss function randomly (checked with tf.check_numerics after each layer). Following that I concluded that the gradients must be an issue and I used tf.clip_by_global_norm and I observed the following output trend:

Learning_rate=9.999204849009402e-06, Global_norm=45138432.0, L1_reconstruction_loss=2676455.0, L2_reconstruction_loss=1177.85400390625, L1_Image_gradient_loss=0.0, L2_Image_gradient_loss=0.0, L2_6_VQ_loss=5526.724609375, L2_4_VQ_loss=2.0605289831494675e+25, L2_2_VQ_loss=0.01820339821279049, Total_loss=2.0605289831494675e+25, Global_norm_1=31815641333760.0 As can be seen, the global_norm of the two GPUs is not even in the same ballpark. This leads me to believe that the copy of the network on the other GPU is not being updated most likely since I already use FixUp+He initialization so my variance is small enough to train without BatchNorm and as I said the network runs just fine on one GPU.

Could you please help?

NifTK / NiftyNet

Multi-GPU initialisation issue #461