Closed FeiYao-Edinburgh closed 4 years ago
If I choose to set
OMP_NUM_THREADS
as the maximum number of cores I have in~/.bashrc
, will it give me the quickest speed?
OMP_NUM_THREADS
is set to the number of available cores by default (at least on the Ubuntu AMI). You can also set it explicitly, but it shouldn't affect performance, unless you deliberately set a lower number. Keeping it empty is convenient as it will choose the number of threads according to your EC2 instance size :)
You can use this OpenMP Hello World to print the number of threads used.
the latter one did not double the speed of the former one.
This is expected as most code does not scale perfectly (Amdahl's law). See GEOS-Chem_scalability for more info.
Keeping it empty is convenient as it will choose the number of threads according to your EC2 instance size :)
Does it apply to local servers similarly? I found this confusing because I read IMPORTANT! If you forget to define OMP_NUM_THREADS in your Unix environment and/or run scripts, then GEOS-Chem will only execute using one core. This can cause GEOS-Chem to execute much more slowly than intended.
from this page. If I set it, will you recommend, at least theoretically, set its number as the maximum number of cores that I have so as to achieve the best performance? If not, how to define it when running scripts? The only possible way that I can think is something like make -j4 mpbuild
that mpbuild
tells to use multiple processors, but how many will it use?
This is expected as most code does not scale perfectly (Amdahl's law). See GEOS-Chem_scalability for more info.
Thanks. Good to know.
The only possible way that I can think is something like
make -j4 mpbuild
thatmpbuild
tells to use multiple processors, but how many will it use?
The number of OpenMP threads is determined at run time, not compile time. make mpbuild
is just to add the -fopenmp
flag so that OpenMP is enabled.
Does it apply to local servers similarly?
The behavior might depend on the compiler. For example, from IBM XL compiler docs:
If you do not set the OMP_NUM_THREADS environment variable, the number of processors available is the default value to form a new team for the first encountered parallel construct.
The number of OpenMP threads is determined at run time, not compile time.
make mpbuild
is just to add the-fopenmp
flag so that OpenMP is enabled.
Thanks for your great explanation! This really makes sense.
The behavior might depend on the compiler.
Hmm... I must admit that this is beyond my knowledge. I use Intel Fortran compiler, i.e. ifort
, despite that GNU Fortran compiler, i.e. gortran
, has also been installed in my two machines with 40 and 32 cores, respectively (see following). Do you have any suggestions for the value of OMP_NUM_THREADS
for each mahine?
CPU(s): 80
Thread(s) per core: 2
Core(s) per socket: 20
Socket(s): 2
CPU(s): 64
Thread(s) per core: 2
Core(s) per socket: 32
Socket(s): 1
Do you have any suggestions for the value of
OMP_NUM_THREADS
for each mahine?
You can use this test script openmp_hello.c
:
#include <omp.h>
#include <stdio.h>
#include <stdlib.h>
int main (int argc, char *argv[]) {
int nthreads, tid;
#pragma omp parallel private(nthreads, tid)
{
tid = omp_get_thread_num();
nthreads = omp_get_num_threads();
printf("Hello World from thread %d of %d\n", tid, nthreads);
}
}
which will print how many threads are actually used on your machine:
$ icc -qopenmp -o openmp_hello.x openmp_hello.c # Intel compiler
$ # gcc -fopenmp -o openmp_hello.x openmp_hello.c # or GNU compiler
$ unset OMP_NUM_THREADS # use default value
$ ./openmp_hello.x # on a 4-core machine
Hello World from thread 0 of 4
Hello World from thread 2 of 4
Hello World from thread 1 of 4
Hello World from thread 3 of 4
$ export OMP_NUM_THREADS=1 # force one thread
$ ./openmp_hello.x
Hello World from thread 0 of 1
which will print how many threads are actually used on your machine
Thanks for your further reply. I have run the program you provided. I found that the number of threads exactly equalled the number of CPU(s) as I listed above. Therefore, I only need to set OMP_NUM_THREADS
a number less than the number of CPU(s) but the greater the better? Frankly, I almost got lost by cores, threads, CPU(s), and etc. I am sure that cores and CPU(s) are different things. However, I found The OMP_NUM_THREADS environment variable sets the number of computational cores (aka threads)
in this page, which equal cores and threads. Since the number of threads is identical to CPU(s), these three are totally same? Or is it just a coincidence for my machines?
I really know that I need more reading to understand these things and I will do it in a later time by myself. Regarding the outcome running the codes you provide, do you recommend specifying OMP_NUM_THREADS
as the number of the threads or CPU(s) that I have or just not specifying it?
I almost got lost by cores, threads, CPU(s), and etc.
Most of time, "core" is a physical/hardware concept (an attribute of your machine), while "thread" is a software concept (determined by your software program). The definition can vary in different contexts -- sometimes people talk about "hardware threads", but in general you can think of "threads" just a software thing, representing how many tasks are executed concurrently by the program.
Therefore, I only need to set
OMP_NUM_THREADS
a number less than the number of CPU(s) but the greater the better?
Most of time you should set num_threads = num_cores, so that each software thread can run on exactly one hardware core. If num_threads < num_cores, there will be unused cores. If num_threads > num_cores, then the physical scores will be oversubscribed (often slows down the program).
do you recommend specifying
OMP_NUM_THREADS
as the number of the threads or CPU(s) that I have or just not specifying it?
You can explicitly set it to the number of cores, if you are unsure about the default behavior. On the EC2 instance, this is not necessary.
Most of time you should set num_threads = num_cores, so that each software thread can run on exactly one hardware core.
This is somewhat the answer that I am looking for! Nevertheless, I still have some confusions that appreciate your further help. Considering the following server information run from lscpu | grep -E '^Thread|^Core|^Socket|^CPU\('
, it is clearly that num_cores=32x1=32
but num_threads=2x32x1=64
. I believe this is because the server has used some hyper-thread technology. openmp_hello.c
testing also told that 64 threads are actually used when running the program. In this case, should I export OMP_NUM_THREADS=32
or export OMP_NUM_THREADS=64
. I feel it should be the later one according to name of OMP_NUM_THREADS
?
CPU(s): 64
Thread(s) per core: 2
Core(s) per socket: 32
Socket(s): 1
so that each software thread can run on exactly one hardware core.
This is ideal for case num_cores=num_threads
. For hyper-thread case in which num_threads
is certain times of num_cores
, would it be great to export OMP_NUM_THREADS=num_threads
? If so, export OMP_NUM_THREADS=num_threads
is universal. If not, is it because of several threads on the same core shared some common resources that cause them cannot got run simultaneously or concurrently? I feel the former one is the answer?
On the EC2 instance, this is not necessary.
Yes. AWS is great in that it removes a great deal of technical batteries. Nevertheless, I can only take it as an additional resource due to limited funding resources.
This is expected as most code does not scale perfectly (Amdahl's law). See GEOS-Chem_scalability for more info.
This might be a very tricky question. Since most code does not scale perfectly, it would be very hard to determine the type of EC2 instances to use for different simulations so as to obtain the minimum price per total running time.
Any further discussion?
For hyper-thread case in which
num_threads
is certain times ofnum_cores
, would it be great toexport OMP_NUM_THREADS=num_threads
?
In my tests, hyperthreading does speed up GEOS-Chem OpenMP a bit, by ~10%. So export OMP_NUM_THREADS=64
should be slightly faster than export OMP_NUM_THREADS=32
in your case. This might not be true for other code, though. See Disabling Intel Hyper-Threading Technology on Amazon Linux if you are interested in more details.
Hello,
I have gone through your wiki page on Setting Unix environment variables for GEOS-Chem. However, I found that GEOSChem_env file does not specify the
OMP_NUM_THREADS
. In this sense, I just wonder willmake -j4 mpbuild
make thegeos.mp
use all the cores available automatically? If I choose to setOMP_NUM_THREADS
as the maximum number of cores I have in~/.bashrc
, will it give me the quickest speed? Previously, I have triedc5.9xlarge
andc5.18xlarge
with they having 18 and 36 Cores, respectively. However, the latter one did not double the speed of the former one. Hope you could clarify these kinds of things to me. Many thanks in advance!Yours faithfully, Fei