Training StyleGAN on multiple GPUs requires Nccl, which is not included on windows.
There is some custom way of reducing and updating all of the gradients across the devices which is not similar to the api's exposed by tensorflow.
This causes an error like:
tensorflow.python.framework.errors_impl.InvalidArgumentError: No OpKernel was registered to support Op 'NcclAllReduce' used by node TrainD/SumAcrossGPUs/NcclAllReduce (defined at D:\data\oliver-train-checkface\fflowhq\00005-sgan-flower-1gpu\src\dnnlib\tflib\optimizer.py:135) with these attrs: [reduction="sum", shared_name="c124", T=DT_FLOAT, num_devices=2]
Training StyleGAN on multiple GPUs requires Nccl, which is not included on windows. There is some custom way of reducing and updating all of the gradients across the devices which is not similar to the api's exposed by tensorflow.
This causes an error like:
tensorflow.python.framework.errors_impl.InvalidArgumentError: No OpKernel was registered to support Op 'NcclAllReduce' used by node TrainD/SumAcrossGPUs/NcclAllReduce (defined at D:\data\oliver-train-checkface\fflowhq\00005-sgan-flower-1gpu\src\dnnlib\tflib\optimizer.py:135) with these attrs: [reduction="sum", shared_name="c124", T=DT_FLOAT, num_devices=2]
There is no drop in replacement that has been found, because the api for tf generic operations like a
HierachicalAllReduce
which is used in Keras like in: https://github.com/tensorflow/tensorflow/issues/21470 is not compatible with thenccl_ops.py
interface https://github.com/tensorflow/tensorflow/blob/r1.14/tensorflow/python/ops/nccl_ops.pyPerhaps even more surprising is the fact that other ops, like:
collective_ops.py
https://github.com/tensorflow/tensorflow/blob/r1.14/tensorflow/python/ops/collective_ops.py do not provide drop in replacements. These ops seem to have completely different use cases as is made clear by their use in tests: https://github.com/tensorflow/tensorflow/blob/r1.14/tensorflow/python/ops/nccl_ops_test.py https://github.com/tensorflow/tensorflow/blob/r1.14/tensorflow/python/ops/collective_ops_test.pyThe line that needs to be updated or removed seems to be the following: https://github.com/check-face/checkface/blob/a88dab03b5803c8c020279bb1d5ab556fc1c3665/src/server/dnnlib/tflib/optimizer.py#L135 This is the point at which all of the device gradients are summed together before updating each of the devices. However, higher level api's like
HierarchicalAllReduce
would handle this entire process, including the updating of each of the devices, but is not well suited to this use case.@olivercoad