mxnet slower on MNIST with a GPU

roboserg commented 6 years ago

I did 5 runs with "channels_first" for both mxnet and "normal" keras with TF on MNIST and CIFAR examples from this repo. For both backends I used "channels_first ". My results are the following:

MNIST:

MXnet: 1min 19s ± 1.17 s per loop (mean ± std. dev. of 5 runs, 1 loop each) TF: 1min ± 1.84 s per loop (mean ± std. dev. of 5 runs, 1 loop each) mxnet 24% slower

CIFAR-10

MXnet: 47 s ± 643 ms per loop (mean ± std. dev. of 5 runs, 1 loop each) TF: 56.8 s ± 527 ms per loop (mean ± std. dev. of 5 runs, 1 loop each) mxnet 17% faster

It is weird, since mxnet supposed to be 50%+ faster then TF. My specs are: CPU i7 6700K, GPU 1070GTX, 16GB RAM. Keras and mxnet 2.1.6. Windows 10 64 bit.

Why is mxnet slower on MNIST?

Code:

# build model
def mnist_model():
    model = Sequential()
    model.add(Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=input_shape))
    model.add(Conv2D(64, (3, 3), activation='relu'))
    model.add(MaxPooling2D(pool_size=(2, 2)))
    model.add(Dropout(0.25))
    model.add(Flatten())
    model.add(Dense(128, activation='relu'))
    model.add(Dropout(0.5))
    model.add(Dense(num_classes, activation='softmax'))

    model.compile(
        loss=keras.losses.categorical_crossentropy,
        optimizer=keras.optimizers.Adadelta(),
        metrics=['accuracy'],        context=["gpu(0)"])

    return model

%%timeit -n 1 -r 5
model = mnist_model()
model.fit(x_train, y_train,
          batch_size=batch_size,
          epochs=12,
          verbose=1,
          validation_data=(x_test, y_test))
score = model.evaluate(x_test, y_test, verbose=0)
print('Test loss:', score[0])
print('Test accuracy:', score[1])

frankfliu commented 6 years ago

Hi @roboserg thanks for submitting the issue, @sandeep-krishnamurthy requesting this be labeled.

sandeep-krishnamurthy commented 6 years ago

Hi @roboserg

Thanks for testing it out. Yes you are right, on smaller dataset, on only 1 GPU, MX backend is slightly slower or similar performance to TF backend. This is also consistent in this Benchmark result - https://github.com/awslabs/keras-apache-mxnet/tree/master/benchmark#cnn-benchmarks

I ran a small experiment with MXNet and TF backend for MNIST_CNN on P2.X machine (1 NVIDIA K80 GPU). MXNet backend takes around 9 seconds per epoch and TF backend takes around 8 seconds per epoch. (Using channels_first format for both backends). So MX backend with 1 GPU should be slightly slower/similar performance as TF, but, not too far. Can you please confirm you are using GPU, image format is channels_first.

You can see significant speed up with MXNet backend on larger images and multiple-GPUs as represented in these benchmarks as well - https://github.com/awslabs/keras-apache-mxnet/tree/master/benchmark#cnn-benchmarks

roboserg commented 6 years ago

I updated / rewrote my first post and did 5 runs for cifar10 (mxnet and TF each).

@sandeep-krishnamurthy

I can confirm I am using a GPU as I monitor the GPU load during training. For mxnet additionally I force the GPU with context=["gpu(0)"] in model.fit()

Image format is channels_first in both cases, mnist x_train shape: (60000, 1, 28, 28), cifar10 x_train shape: (50000, 3, 32, 32)

I will try bigger images with VGG net or Inception and will report the results.

sandeep-krishnamurthy commented 6 years ago

@roboserg - Thank you.

You can also use Benchmark utility we have to test bigger RESNET network - https://github.com/awslabs/keras-apache-mxnet/tree/master/benchmark

kalyc commented 6 years ago

Thanks for diving into the issue @roboserg As mentioned by @sandeep-krishnamurthy the performance results you see are consistent with the benchmarking reports.
Closing this issue for now, feel free to re-open to report more stats on the performance.

awslabs / keras-apache-mxnet

mxnet slower on MNIST with a GPU #128

MNIST:

CIFAR-10