Open ElevenDreamer opened 7 years ago
Could you turn on GLOG_v=1 and run the Caffe2 binary again? It might be that caffe2's omp initialization code is setting omp threads back to 1:
https://github.com/caffe2/caffe2/blob/master/caffe2/core/init_omp.cc
@Yangqing thanks for reply. I am sorry for not reply in time. GLOG_v already open. default init log show below: env OMP_NUM_THREADS:
but the problem still exist.
Another things when we test caffe2 find that: In python test code, workspace.GlobalInit() when we change parameter --caffe2_omp_num_threads ,the convolution layer can use multicore. the cost time will decrease when we set caffe2_omp_num_threads=4. however, the full_connect layer not change. I am afraid that the blas lib cannot use multicore do matrix multiply in caffe2. but in other caffe2 code maybe multicore works .
Thanks for your time.
@ElevenDreamer Hi, Did you solve this problem? I have same problem too If you solve this problem, can you give a hint please? Thanks
@dlwtojd26 not yet, I try some measures, but cannot solve.I am waiting for official reply. Now we use TF and pytorch first.
@ElevenDreamer hmmm, sad news : < Thanks to your reply. If i know how to handle it, I'll reply.
hi, I meet a problem when use caffe2 in Pine64. ubuntu~16.04.1. cpu 4 cores Cortex-A53 In caffe2, multicore cannot use when use eigen do matrix multiply. (the same to use openblas) when compile I add -fopenmp , also export OMP_NUM_THREADS=4. I use model alexnet for predict test. when I analysis program cost time. I find fc cost most time. when do matrix multipy ,the cpu only use one core, but we have 4 cores.
for test ,I do the matrix multiply test only use eigen . code:
MatrixXf m1 = MatrixXf::Random(1,9000);
MatrixXf m2 = MatrixXf::Random(9000,4000);
MatrixXf m3(1,4000);
int n = Eigen::nbThreads();
int i = 0;
for (;i < 10; ++i) {
cout << "n thread=" << n << endl;
m3 = m1 * m2;
}
multicore work show that:
but when I use the same code in caffe2. I add code in Gemm function in file math_cpu.cc. I use full_connected_op_test for test. but only use one cores show that:
why the same code cannot use multicore in caffe2?
I cannot find any other way to resolve the problem, anyone who can give a hint,please? Thanks