Closed sandeep-krishnamurthy closed 4 years ago
I see that we are still using mshadow dot here: https://github.com/apache/incubator-mxnet/blob/master/src/operator/tensor/dot-inl.h#L119 . @DickJC123 @zheng-da changed many operators to use linalg_gemm instead of mshadow::expr::dot. Can you guys provide more insight on if there was a performance gain from this change on CPU.
Adding @piiswrong for comment.
@anirudh2290 Does mshadow::expr::dot
call GEMM as well?
yes, MKL GEMM will be much faster than other implementations.
The pre-built version is not linked with MKL library. We're working on making a static link to MKL for pre-built binary.
Yes, AFAIK mshadow::expr::dot uses gemm. dot_engine-inl.h
has standalone implementations and also supports calling other blas implementations: https://github.com/dmlc/mshadow/blob/master/mshadow/dot_engine-inl.h#L123 and https://github.com/dmlc/mshadow/blob/master/mshadow/dot_engine-inl.h#L280
Thanks. So if we build from source by USE_BLASS=MKL, it will be faster. @sandeep-krishnamurthy could you take a try? FYI, you can set MKL_VERBOSE=1 so there're detailed information of MKL GEMM in the runtime. We can do further analysis and optimizaiton for the different size of GEMM.
@pengzhao-intel @anirudh2290 - Thanks for your comments. Next step, I will try to build from source with USE_BLAS set to MKL and report back if there are performance gain.
@anirudh2290 - To summarize your comment - are you saying mx.sym.dot() does not use efficient MKL GEMM implementation?
@pengzhao-intel - If I do pip install mxnet-mkl are you saying we don't get mkl linked? If I use mxnet-mkl on a AWS C5 instance with MKL-DNN, will it use MKL?
I‘m afraid mxnet-mkl package is built with USE_BLAS=openblas. You can build from source with USE_BLAS=mkl if you have mkl library installed. Also, do you know how much of this computation time is consumed by lstm layer or other non-rnn fully connected layers? We are trying to build fused lstm operator for mxnet on cpu. Hope that will help you a lot.
We're updating the labels to better indicate MXNet Backend issues. @sandeep-krishnamurthy can you please update the label from "C++" to "Backend"? Thanks!
@pengzhao-intel is this something you guys can help with?
@lupesko Sure
Regarding dot, it's a kind of library operation, GEMM. There're no much can be optimized from the framework level. Just change to Intel MKL will achieve better performance.
We are using mx.sym.dot() operator in Keras heavily. We observe CPU performance is suspiciously slower. On profiling a RNN LSTM example, the observation is as shown below.
dot() operator is contributing to 90% of computation time. Is there any performance implication of mx.sym.dot() operator on CPU?
We are using mxnet-mkl-dnn build, is the operator using gemm operations under the hood?
@anirudh2290 @zheng-da - Any suggestions / comments?