apache / mxnet

Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more
https://mxnet.apache.org
Apache License 2.0
20.79k stars 6.79k forks source link

the recall_model->exec->Forward cost most time, how can I reduce the cost time? #15450

Open gxkevin opened 5 years ago

gxkevin commented 5 years ago

I have a problem is that when I do the dnn predict, where I will use the SyncCopyFromCPU and the Forward, the batch_size and fea_num is 40,default blas is openblas(I have also tried the Intel mkl,but it doesn't work),the cpu is broadwell,58 logical core total。

I have 32-58 worker thread,each thead only have 1 openmp thread,I worry that open too many openmp thread will decrease the performance。

After the test, I found that , the predict totally cost 13.9ms, SyncCopyFromCPU cost 275us, but the Forward cost 11ms, have can i reduce the forward cost time ?

    dnn_model->model_data["data"].SyncCopyFromCPU(batch_data.data(), batch_size * fea_num);
    mxnet::cpp::NDArray::WaitAll();
    dnn_model->exec->Forward(false);
    mxnet::cpp::NDArray::WaitAll();
gxkevin commented 5 years ago

56 logical cores

lanking520 commented 5 years ago

Hi @leleamol could you please take a look