Open sandeep-krishnamurthy opened 6 years ago
@pengzhao-intel is this something you guys can help with?
@lupesko Will look into the code. @sandeep-krishnamurthy Could you change to MXNet 1.3 with fused RNN cell which includes our optimization version of vRNN/LSTM/GRU?
@pengzhao-intel - Thank you. Unfortunately, Keras engine has an implementation of these Layers using the low-level operators (broadcast_add, div, mul, split, zeros_like, dot etc...) from backends. For example, see here the rnn operator implementation in MXNet backend - https://github.com/awslabs/keras-apache-mxnet/blob/master/keras/backend/mxnet_backend.py#L2640 It mostly has information to use low-level operators to achieve the functionality.
Using high-level constructs like vRNN/LSTM/GRU from MXNet requires us to break Keras engine implementation and short circuit it to MXNet high-level operators.
@roywei @kalyc
@sandeep-krishnamurthy yes, Keras built the RNN with basic blocks, like dot, tanh. But I think it's a legacy issue from Theano and Tensorflow which don't have fused RNN so Keras has to build the cell by itself. Currently, the MXNet (pytorch) includes the fused RNN for both CPU and GPU. I think setting up the fused RNN cell in Keras, like what gluon does, will be a nice solution now. Actually, I know it's not easy :)
Keras-MXNet is slower on CPU. See early benchmark results here - https://github.com/awslabs/keras-apache-mxnet/blob/master/benchmark/benchmark_result/RNN_result.md
This issue is likely due to: