is mulit-gpu important?

For single-cpu, it is also faster. According to my test, on a Titan X card, Caffe version has speed of "0.4iter/s", while MXNet has "0.75iter/s". For two gpus and three gpus, MXNet has "1.1 iter/s" and "1.3 iter/s". The boost is bounded for more gpus because the crf layer is implemented in cpu mode.

Using multi-gpu is also helpful for resnet50 backbone, which requires much more memory. Using single gpu needs a graphic card with very large memory.