Keras-MXNet RNN CPU performance is slower

awslabs / keras-apache-mxnet

[DEPRECATED] Amazon Deep Learning's Keras with Apache MXNet support

https://github.com/awslabs/keras-apache-mxnet/wiki

Other

290 stars 65 forks source link

Keras-MXNet RNN CPU performance is slower #99

Open sandeep-krishnamurthy opened 6 years ago

sandeep-krishnamurthy commented 6 years ago

Keras-MXNet is slower on CPU. See early benchmark results here - https://github.com/awslabs/keras-apache-mxnet/blob/master/benchmark/benchmark_result/RNN_result.md

This issue is likely due to:

MXNet broadcast_add operator being slower on CPU - https://github.com/apache/incubator-mxnet/issues/8219
MXNet dot operator being slower on CPU - https://github.com/apache/incubator-mxnet/issues/10881

lupesko commented 6 years ago

@pengzhao-intel is this something you guys can help with?

pengzhao-intel commented 6 years ago

@lupesko Will look into the code. @sandeep-krishnamurthy Could you change to MXNet 1.3 with fused RNN cell which includes our optimization version of vRNN/LSTM/GRU?

sandeep-krishnamurthy commented 6 years ago

@pengzhao-intel - Thank you. Unfortunately, Keras engine has an implementation of these Layers using the low-level operators (broadcast_add, div, mul, split, zeros_like, dot etc...) from backends. For example, see here the rnn operator implementation in MXNet backend - https://github.com/awslabs/keras-apache-mxnet/blob/master/keras/backend/mxnet_backend.py#L2640 It mostly has information to use low-level operators to achieve the functionality.

Using high-level constructs like vRNN/LSTM/GRU from MXNet requires us to break Keras engine implementation and short circuit it to MXNet high-level operators.

@roywei @kalyc

pengzhao-intel commented 6 years ago

@sandeep-krishnamurthy yes, Keras built the RNN with basic blocks, like dot, tanh. But I think it's a legacy issue from Theano and Tensorflow which don't have fused RNN so Keras has to build the cell by itself. Currently, the MXNet (pytorch) includes the fused RNN for both CPU and GPU. I think setting up the fused RNN cell in Keras, like what gluon does, will be a nice solution now. Actually, I know it's not easy :)