Training on multiple GPUs with tf.distribute.MirroredStrategy

yuri-qq commented 3 years ago

Hello,

first of all, thank you for this awesome project. I'm interested in training my own model. Unfortunately with the current GPU shortage I can't buy an appropriate GPU for a reasonable price. However, I have 2 GTX 1060 6GB and could possibly get hold of more. That got me wondering if it would be possible to train on mutliple GPUs. I read the documentation of tf.distribute.MirroredStrategy and tried to modify the code of the train_project function similar to how it was done in the example of the TensorFlow docs, but that resulted in 2/3 of the performance of just 1 GPU. I have to say I'm not at all familiar with neural networks. I get the basic concept, but I never worked with a library like TensorFlow. Researching the problem I found that for very densely connected neural networks a mirrored strategy might actually hurt performance, because of bandwidth limitations. The NVIDIA X Server Settings show me a peak PCIe bandwidth utilization of around 50%. The guide on the TensorFlow website claims the tf.distribute.Strategy API can be utilized "with minimal code changes", but I guess that depends and might not be that easy in some cases. When I start the training, the model is loaded into the memory of both GPUs and both GPUs are utilized, but I get a lot of these warnings:

``` 2021-04-22 04:40:35.020293: W tensorflow/core/grappler/optimizers/data/auto_shard.cc:695] AUTO sharding policy will apply DATA sharding policy as it failed to apply FILE sharding policy because of the following reason: Found an unshardable source dataset: name: "TensorDataset/_2" op: "TensorDataset" input: "Placeholder/_0" input: "Placeholder/_1" attr { key: "Toutput_types" value { list { type: DT_FLOAT type: DT_FLOAT } } } attr { key: "output_shapes" value { list { shape { dim { size: 4 } dim { size: 128 } dim { size: 128 } dim { size: 3 } } shape { dim { size: 4 } dim { size: 8434 } } } } } ```

So my question is, do you have an idea why the training on multiple GPUs is so slow? Could it be a problem with how the neural network is architectured or did I do something wrong?

Thanks for your time and effort!

KichangKim commented 3 years ago

I have never been used distributed training, but here are something differences with TensorFlow's sample:

Main training loop uses Model.train_on_batch() instead of Model.fit()
Dataset is python-method based customized thing. So it cannot be optimized for multithreading/distributed environment.
For network structure, DeepDanbooru uses Resnet. So you may need to check Reset-style networks are suitable for distributed training.

yuri-qq commented 3 years ago

I see. I did some more research on ResNets and using TensorFlow and I think I'll try to come up with something on my own. Thanks!

KichangKim / DeepDanbooru

Training on multiple GPUs with tf.distribute.MirroredStrategy #33