Closed bzamecnik closed 7 years ago
@bzamecnik This keras_exp package is not production ready. I posted it as examples for others to copy and use. I used the "Unlicense" license so that anyone can just copy and do whatever they want with the code.
To make it work just add the path to where the keras_exp dir/package resides i.e.
export PYTHONPATH=/path-to-parent-dir-of-keras_exp:$PYTHONPATH
Then you can run the examples from any directory, you don't have to be in the directory where keras_exp resides. Eventually, if I have time to fix up the package, clean it up, add unit tests, etcetera then I will add a setup for pip installation. For now just use PYTHONPATH or symbolic links.
In order to be able to make benchmarks with different more easily I tried to adapt both the kuza55
and avolkov1
make_parallel()
implementations into a python package that can be installed via pip. Also I tried to make runnable examples using some bigger real-world models, such as inception3 and resnet50. The code can be found at: https://github.com/rossumai/keras-multi-gpu/blob/master/keras_tf_multigpu/examples/benchmark_inception3_resnet50.py. So far I'm running a lot of measurements and preparing some summary blog article.
Nice! I'll add a setup.py to the repo here so that it can be installed with pip in editable mode. Again, the code is too unstable and probably buggy for it to be pushed to PyPi. My intention is to just illustrate code patterns and idioms for multigpu and distributed runs.
In regards to getting better speed ups, perhaps using TF pipeline directly to feed the Keras model networks would be faster. This is the idea with mnist_tfrecord_mgpu.py
. I wrote an example with Cifar10 that I'll post today as well. It's not actually using tfrecords, but using TF queues/train.batch pipeline to speed up data throughput to the GPUs. It's a bit faster, and hopefully for larger scale problems the improvement should be more significant.
So probably to optimize the speed with your benchmarks that are using my mini-framework here, try to combine a TF ingest pipeline with Keras multigpu model parallelization.
OK, thank you :). It might be useful to have setup.py for local development, even without publishing to PyPI.
Yes, so far it seems that make_parallel()
is able to speed up some non-trivial models (inception3/resnet50) on synthetic imagenet up to 1.8x on 2 GPUs. The next missing step to try is exactly some kind TF queue. I though that TFRecord is required for the queue. If not, it would be better/simpler. There are at least three options to try (queue_runner, Dataset API, StagingArea). So we'll see if any of that help. I'll definitely check mnist_tfrecord_mgpu.py. and the TF tutorials for that. But first, the priority now is to summarize the experiments and observations into a blog article or series. Thank you.
@avolkov1 Thanks for the cifar10_cnn_mgpu_tfqueue.py example. I tried to run/measure that today on Azure 2xM60 and our 1x/2x/4x GTX 1070 and with batch_size=512 it indeed scales well. Awesome work! Asynchronous feeding is exactly what I suspected is missing in the puzzle.
Next I'll try to measure this example better and incorporate it in the the upcoming blog article series. The next step what we'd like to do it to make it easier to use, ie. if possible encapsulate the queue in some class that you can use easily for single-GPU or multi-GPU training.
The examples do not work out of the box due to inability to import the multi-gpu modules.
PYTHONPATH contains the path to
mnist_tfrecord_mgpu.py
but we need to importkeras_exp
, so that current directory is root of this git repo.A workaround is to provide explicit
PYTHONPATH
:A proper approach would be to allow installing the package via pip, eg. in develop mode. It means providing a
setup.py file
.