facebookarchive / fb-caffe-exts

Some handy utility libraries and tools for the Caffe deep learning framework.
Other
457 stars 154 forks source link

** =predictor/= A simple C++ library that wraps the common pattern of running a =caffe::Net= in multiple threads while sharing weights. It also provides a slightly more convenient usage API for the inference case.

+BEGIN_SRC c++

include "caffe/predictor/Predictor.h"

// In your setup phase predictor_ = Predictor::paths(FLAGS_prototxt_path, FLAGS_weights_path, FLAGS_optimization);

// When calling in a worker thread static thread_local caffe::Blob input_blob; input_blob.set_cpu_data(input_data); // avoid the copy. const auto& outputblobs = predictor->forward({&input_blob}); return output_blobs[FLAGS_output_layer_name];

+END_SRC

Of note is the =predictor/Optimize.{h,cpp}=, which optimizes memory usage by automatically reusing the intermediate activations when this is safe. This reduces the amount of memory required for intermediate activations by around 50% for AlexNet-style models, and around 75% for GoogLeNet-style models.

We can plot each set of activations in the topological ordering of the network, with a unique color for each reused activation buffer, with the height of the blob proportional to the size of the buffer.

For example, in an AlexNet-like model, the allocation looks like

+ATTR_HTML: :height 300px

[[./doc/caffenet.png]]

A corresponding allocation for GoogLeNet looks like

+ATTR_HTML: :height 300px

[[./doc/googlenet.png]]

The idea is essentially linear scan register allocation. We

Depending on the model, the buffer reuse can also lead to some non-trivial performance improvements at inference time.

To enable this just pass =Predictor::Optimization::MEMORY= to the =Predictor= constructor.

=predictor/PooledPredictor{.h,cpp}= maintains a thread-pool with thread-local instances of =caffe::Net=. Calls to =PooledPredictor::forward()= are added to a =folly::MPMCQueue=, which are then dequeued by the thread-pool for processing. Calls to =forward()= are non-blocking and return a =folly::Future= that will be satisfied when the forward pass job finishes. =PooledPredictor= also supports running multiple models over the same thread-pool. That is, if you load two models, each thread in the thread-pool will maintain two instances of =caffe::Net= (one for each model), and the =netId= param in =forward()= specifies the model to run. =PinnedPooledPredictor= is an abstraction over =PooledPredictor= when used with multiple models to pin the =forward()= calls to a specific model.

+BEGIN_SRC c++

include "caffe/predictor/PooledPredictor.h"

// In your setup phase caffe::fb::PooledPredictor::Config config; config.numThreads = 10; config.optimization = caffe::fb::Predictor::Optimization::MEMORY; config.protoWeightPaths_.emplace_back(FLAGS_prototxt_path, FLAGS_weightspath); pooledPredictor = caffe::fb::PooledPredictor::makePredictor(config);

// When calling predictor caffe::fb::PooledPredictor::OutputLayers outputblobs; pooledPredictor->forward({&input_blob}, &output_blobs) .then([&] { const auto& output_blob = outputs_blobs[FLAGS_output_layer_name]; // Do something with output_blob });

+END_SRC

** =torch2caffe/= A library for converting pre-trained Torch models to the equivalent Caffe models.

=torch_layers.lua= describes the set of layers that we can automatically convert, and =test.lua= shows some examples of more complex models being converted end to end.

For example, complex CNNs ([[http://arxiv.org/abs/1409.4842][GoogLeNet]], etc), deep LSTMs (created in [[https://github.com/torch/nngraph][nngraph]]), models with tricky parallel/split connectivity structures ([[http://arxiv.org/abs/1103.0398][Natural Language Processing (almost) from Scratch]]), etc.

This can be invoked as

+BEGIN_EXAMPLE

∴ th torch2caffe/torch2caffe.lua --help --input (default "") Input model file --preprocessing (default "") Preprocess the model --prototxt (default "") Output prototxt model file --caffemodel (default "") Output model weights file --format (default "lua") Format: lua | luathrift --input-tensor (default "") (Optional) Predefined input tensor --verify (default "") (Optional) Verify existing

(number) Input dimensions (e.g. 10N x 3C x 227H x 227W) #+END_EXAMPLE This works by - (optionally) preprocessing the model provided in =--input=, (folding BatchNormalization layers into the preceding layer, etc), - walking the Torch module graph of the model provide in =--input=, - converting it to the equivalent Caffe module graph, - copying the weights into the Caffe model, - Running some test inputs (of size =input_dims...=) through both models and verifying the outputs are identical. ** =conversions/= A simple CLI tool for running some simple Caffe network transformations. #+BEGIN_EXAMPLE ∴ python conversions.py vision --help Usage: conversions.py vision [OPTIONS] Options: --prototxt TEXT [required] --caffemodel TEXT [required] --output-prototxt TEXT [required] --output-caffemodel TEXT [required] --help Show this message and exit. #+END_EXAMPLE The main usage at the moment is automating the [[https://github.com/BVLC/caffe/blob/master/examples/net_surgery.ipynb][Net Surgery]] notebook. ** Building and Installing As you might expect, this library depends on an up-to-date [[http://caffe.berkeleyvision.org/][BVLC Caffe]] installation. The additional dependencies are - The C++ libraries require [[https://github.com/facebook/folly][folly]]. - The Python =conversions= libraries requires [[http://click.pocoo.org/5/][click]]. You can drop the C++ components into an existing Caffe installation. We'll update the repo with an example modification to an existing =Makefile.config= and a =CMake= based solution. ** Contact Feel free to open issues on this repo for requests/bugs, or contact [[mailto:tulloch@fb.com][Andrew Tulloch]] directly.