the recall_model->exec->Forward cost most time, how can I reduce the cost time?

gxkevin commented 5 years ago

I have a problem is that when I do the dnn predict, where I will use the SyncCopyFromCPU and the Forward， the batch_size and fea_num is 40，default blas is openblas（I have also tried the Intel mkl，but it doesn't work），the cpu is broadwell，58 logical core total。

I have 32-58 worker thread，each thead only have 1 openmp thread，I worry that open too many openmp thread will decrease the performance。

After the test, I found that , the predict totally cost 13.9ms, SyncCopyFromCPU cost 275us, but the Forward cost 11ms， have can i reduce the forward cost time ?

    dnn_model->model_data["data"].SyncCopyFromCPU(batch_data.data(), batch_size * fea_num);
    mxnet::cpp::NDArray::WaitAll();
    dnn_model->exec->Forward(false);
    mxnet::cpp::NDArray::WaitAll();

gxkevin commented 5 years ago

56 logical cores

lanking520 commented 5 years ago

Hi @leleamol could you please take a look

apache / mxnet

the recall_model->exec->Forward cost most time, how can I reduce the cost time? #15450