Closed Piyush3dB closed 8 years ago
There is communication overhead between GPU and CPU (memory). If your model is too tiny, then it does not worth running on GPUs. If you try at least the cifar10 (or maybe even the LeNet on MNIST) you will see difference.
@pluskid thanks for your reply.
I've performed another experiment training the same MLP network on GPU, but this time formulating it using fully connected layers (mx.symbol.FullyConnected
) instead of convolutional layers (mx.symbol.Convolution
) as in
def get_mlp():
"""
multi-layer perceptron using fully connected layers
"""
data = mx.symbol.Variable('data')
fc1 = mx.symbol.FullyConnected(data = data, name='fc1', num_hidden=128)
act1 = mx.symbol.Activation(data = fc1, name='relu1', act_type="relu")
fc2 = mx.symbol.FullyConnected(data = act1, name = 'fc2', num_hidden = 64)
act2 = mx.symbol.Activation(data = fc2, name='relu2', act_type="relu")
fc3 = mx.symbol.FullyConnected(data = act2, name='fc3', num_hidden=10)
mlp = mx.symbol.SoftmaxOutput(data = fc3, name = 'softmax')
group = mx.symbol.Group([data, fc1, act1, fc2, act2, fc3, mlp])
return mlp, group
This formulation gives much faster performance (0.8 instead of 144 timecost):
2016-05-02 17:47:56,896 Node[0] Epoch[0] Resetting Data Iterator
2016-05-02 17:47:56,896 Node[0] Epoch[0] Train-accuracy=0.908437
2016-05-02 17:47:56,896 Node[0] Epoch[0] Train-top_k_accuracy_5=0.990485
2016-05-02 17:47:56,897 Node[0] Epoch[0] Train-top_k_accuracy_10=1.000000
2016-05-02 17:47:56,897 Node[0] Epoch[0] Train-top_k_accuracy_20=1.000000
2016-05-02 17:47:56,897 Node[0] Epoch[0] Time cost=0.817
2016-05-02 17:47:56,975 Node[0] Epoch[0] Validation-accuracy=0.958233
2016-05-02 17:47:56,975 Node[0] Epoch[0] Validation-top_k_accuracy_5=0.998998
2016-05-02 17:47:56,975 Node[0] Epoch[0] Validation-top_k_accuracy_10=1.000000
2016-05-02 17:47:56,975 Node[0] Epoch[0] Validation-top_k_accuracy_20=1.000000
So I'm curious to know why an MLP using 1x1 convolutions is way slower than the fully-connected variant when in fact they are both mathematically the same!
Two questions:
get_mlp()
and get_mlpcn()
) are mathematically equivalent?Many thanks!
Similarly, solving Ax=b via Gaussian elimination and via computing inv(A)*b are mathematically equivalent but one is much slower than the other. Hope this helps clarify the puzzle.
@pluskid I see what you mean, thanks. Hopefully I'll start to understand the implementation differences as I get more familiar with the code, and be able to profile and see where exactly the bottle necks are in this experimental setup.
Hello,
I've setup a simple MLP network using convolutional layers (
mx.symbol.Convolution
) instead of fully connected layers (mx.symbol.FullyConnected
). The network definition is as follows:I've done the shape inference and verified everything is correct.
The problem I have is this takes much longer to train on the GPU (GTX 980Ti) than CPU (i5-6500).
For 1 epoch mnist training the GPU performance is:
Whereas for CPU is:
The time cost for GPU is more than twice that of CPU!!
I'm wondering if anyone knows why this is happening, and how I can debug this to find where the bottlenecks are?
Many thanks!