Closed shihenw closed 8 years ago
You can set OMP_NUM_THREADS
to the number of threads you want to use (NNPACK doesn't use OpenMP, but we decided to use this environment variable for compatibility with other Caffe backends). However, keep in mind that currently NNPACK doesn't scale well beyond 8 threads.
Hmm, seems not working. I used export OMP_NUM_THREADS=4
and ran the program, but the number of CPU% is the same. I couldn't grep OMP_NUM_THREADS
anything from NNPACK
or caffe-nnpack
either.
It is in this commit
It seems nnpack-pr changed this. The new logic is here: https://github.com/ajtulloch/caffe/pull/1/files#diff-606f9e31983814fd9cfbae12c21bba96R11
I tried multiple thread by hard coding numbers in nnpack_pool.h
. But using 2 threads is slower than 1. Is this because I am using openBLAS instead of Intel MKL (as suggested by variable name num_mkl_threads
)?
@shihenw This is not what expected. The only possible explanation I have is that your convolutional layer is very small and the latency to wake up threads is comparable to multi-threaded computation (when 1 thread is used, nnpack-pr
doesn't computes on the caller thread).
It is certainly not related to use of OpenBLAS vs MKL.
Hi, @Maratyszcza I got some speedup with multiple threads, where the speedup becomes more obvious when the workload is heavier (larger batch size). Experimented on my own network, here's what I got:
How can I set the thread pool size in caffe2? @Maratyszcza @shihenw
Hi @Maratyszcza,
I am running VGG-19 with caffe-nnpack on a machine with 2 Intel Xeon E5-2660 v3 Haswell 2.6 GHz CPUs (total 20 cores). I was able to get similar result compared to numbers shown in
readme.md
: (timed with CPUTimer incaffe/util/benchmark
)The speedup is fantastic. However, from the command
top
, the number for CPU% never exceeded 100% during the run. If all cores are fully utilized, shouldn't I observe a number significantly larger than 100%? Like 2000% for this machine?