Open XiaotaoChen opened 6 years ago
Great analysis. I think setting OMP_NUM_THREADS
to the number of physical core is a good choice and you'd better bind all threads to physical cores by setting KMP_AFFINITY
.
I'm afraid below code is not portable and cannot run other OS.
int physical_cpus = std::stoi(exec_shell("cat /proc/cpuinfo |grep 'physical id'|sort -u|wc -l"));
//get the physical core count of each physical cpu
int cores = std::stoi(exec_shell("cat /proc/cpuinfo |grep 'cores'|sort -u|awk '{print $4}' "));
FYI, here are some previous discussions about it: https://github.com/apache/incubator-mxnet/issues/9545#issuecomment-361874591 Contribution is welcome :)
yes, the code works only on linux. i tested it on my ubuntu. thx :) @TaoLv
Thanks, Xiaotao
Add @cjolivier01 to comment since he implemented this piece of code to set thread number for mxnet.
one day maybe someone will write some portable code to determine actual number of physical cores. on Linux i’ve seen code that does it which did all sorts of parsing stuff out of /proc directory — it was a lot of code. maybe Intel has an mkl routine that they could share?
I think hyperthreading improve very small in this highly synchronized numeric computation situation. And pytorch used https://github.com/pytorch/cpuinfo to detect cpu.
here is my cpu info
Clearly, each core only have one thread, without hyperthreading.
Then i run the benchmark_score.py. the results are as follows
here is the utilization of each core
It shows that only 14 cores used by 14 threads. And the 14 threads are created by mkldnn(i guess). the other threads which almost haven't use cpu are create by engine and other components of mxnet.
Analysis
mxnet treats all machines as hyperthreading enabled. however, CNN is computationally intensive application, Using hypertrhreading can't further increase the computing power, but adds additional overhead. So mxnet only create half cpu core threads, then OS will schedule each thread to the independent cpu core, this can avoid extra costs.
And the suggestion( https://zh.mxnet.io/blog/mkldnn ) to set omp_num_threads=vCPUs/2 is to avoid hyperthreading.
According to my cpu info, each core only have 1 thread,without hyper-threading. even if create 28 threads which will run on each physical core independently. those threads won't compete for resources like hyperthreading. Using whole cpu cores seems to improve efficency.
Solution
the source code of engine component in mxnet-mkdnn(v1.2.0) added some functions to set omp_num_threads in
openmp.h|openmp.cc
file. there are some codes to set the thread nubmer. like the constructor of OpenMP inopenmp.cc
according to the constructor, if the user doesn't set OMP_NUM_THREADS, mxnet will set the
omp_threads = omp_get_num_procs()/2
. the function ofomp_get_num_procs
can return the whole cpu cores.So there are to solutions: (1) set environment variable of
OMP_NUM_THREADS
; (2) rewrite the code as below:1. set environment variable of
OMP_NUM_THREADS
rewrite the code
Whether to enable hyperthreading or not, ensure the
omp_thread_max_
is equal to cpu physical cores.such as:
result