avolkov1 / keras_experiments

Experimental Keras libraries and examples.
The Unlicense
86 stars 16 forks source link

ImportError: No module named keras_exp.multigpu #5

Closed bzamecnik closed 7 years ago

bzamecnik commented 7 years ago

The examples do not work out of the box due to inability to import the multi-gpu modules.

$ python examples/mnist/mnist_tfrecord_mgpu.py
Using TensorFlow backend.
Traceback (most recent call last):
  File "examples/mnist/mnist_tfrecord_mgpu.py", line 57, in <module>
    from keras_exp.multigpu import (
ImportError: No module named keras_exp.multigpu

PYTHONPATH contains the path to mnist_tfrecord_mgpu.py but we need to import keras_exp, so that current directory is root of this git repo.

A workaround is to provide explicit PYTHONPATH:

$ PYTHONPATH=. python examples/mnist/mnist_tfrecord_mgpu.py

A proper approach would be to allow installing the package via pip, eg. in develop mode. It means providing a setup.py file.

avolkov1 commented 7 years ago

@bzamecnik This keras_exp package is not production ready. I posted it as examples for others to copy and use. I used the "Unlicense" license so that anyone can just copy and do whatever they want with the code.

To make it work just add the path to where the keras_exp dir/package resides i.e.

export PYTHONPATH=/path-to-parent-dir-of-keras_exp:$PYTHONPATH

Then you can run the examples from any directory, you don't have to be in the directory where keras_exp resides. Eventually, if I have time to fix up the package, clean it up, add unit tests, etcetera then I will add a setup for pip installation. For now just use PYTHONPATH or symbolic links.

bzamecnik commented 7 years ago

In order to be able to make benchmarks with different more easily I tried to adapt both the kuza55 and avolkov1 make_parallel() implementations into a python package that can be installed via pip. Also I tried to make runnable examples using some bigger real-world models, such as inception3 and resnet50. The code can be found at: https://github.com/rossumai/keras-multi-gpu/blob/master/keras_tf_multigpu/examples/benchmark_inception3_resnet50.py. So far I'm running a lot of measurements and preparing some summary blog article.

avolkov1 commented 7 years ago

Nice! I'll add a setup.py to the repo here so that it can be installed with pip in editable mode. Again, the code is too unstable and probably buggy for it to be pushed to PyPi. My intention is to just illustrate code patterns and idioms for multigpu and distributed runs. In regards to getting better speed ups, perhaps using TF pipeline directly to feed the Keras model networks would be faster. This is the idea with mnist_tfrecord_mgpu.py. I wrote an example with Cifar10 that I'll post today as well. It's not actually using tfrecords, but using TF queues/train.batch pipeline to speed up data throughput to the GPUs. It's a bit faster, and hopefully for larger scale problems the improvement should be more significant. So probably to optimize the speed with your benchmarks that are using my mini-framework here, try to combine a TF ingest pipeline with Keras multigpu model parallelization.

bzamecnik commented 7 years ago

OK, thank you :). It might be useful to have setup.py for local development, even without publishing to PyPI.

Yes, so far it seems that make_parallel() is able to speed up some non-trivial models (inception3/resnet50) on synthetic imagenet up to 1.8x on 2 GPUs. The next missing step to try is exactly some kind TF queue. I though that TFRecord is required for the queue. If not, it would be better/simpler. There are at least three options to try (queue_runner, Dataset API, StagingArea). So we'll see if any of that help. I'll definitely check mnist_tfrecord_mgpu.py. and the TF tutorials for that. But first, the priority now is to summarize the experiments and observations into a blog article or series. Thank you.

bzamecnik commented 7 years ago

@avolkov1 Thanks for the cifar10_cnn_mgpu_tfqueue.py example. I tried to run/measure that today on Azure 2xM60 and our 1x/2x/4x GTX 1070 and with batch_size=512 it indeed scales well. Awesome work! Asynchronous feeding is exactly what I suspected is missing in the puzzle.

Next I'll try to measure this example better and incorporate it in the the upcoming blog article series. The next step what we'd like to do it to make it easier to use, ie. if possible encapsulate the queue in some class that you can use easily for single-GPU or multi-GPU training.