TESSEorg / ttg

TTG: Template Task Graph C++ API
18 stars 12 forks source link

All PaRSEC threads binding to the same physical core #130

Open josephjohnjj opened 3 years ago

josephjohnjj commented 3 years ago

Hi,

When am running a TTG program all the thread gets bound to the same physical core. Things are working better when I use

--bind-to none

Are there any performance problems if I use --bind-to none?

Program was compiled using the following modules- intel-mkl/2021.2.0 boost/1.71.0 openmpi/4.0.2 eigen/3.3.7 libunwind/1.2.1 intel-compiler/2021.2.0 and I am working the parsec commit 15b871975fa596e1f2d5e4430c405d9e1b50e54d.

Regards, Joseph

devreal commented 3 years ago

The bind-to none option is passed to Open MPI or parsec? Parsec should ignore the existing binding of MPI iirc and use all cores by default... Can you post your command and configuration?

On Sat, Jul 31, 2021, 08:57 Joseph John @.***> wrote:

Hi,

When am running a TTG program all the thread gets bound to the same physical core. Things are working better when I use

--bind-to none

Are there any performance problems if I use --bind-to none. Program was compiled using the following modules.

intel-mkl/2021.2.0 boost/1.71.0 openmpi/4.0.2 eigen/3.3.7 libunwind/1.2.1 intel-compiler/2021.2.0

Regards, Joseph

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/TESSEorg/ttg/issues/130, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACTXKJVCBBOIFPHBCF4QRJDT2P6LNANCNFSM5BKBPZFQ .

josephjohnjj commented 3 years ago

The bind-to none option was passed to mpi. This was the pbs script I used initially where all the threads were getting bound to the same physical core and the job was getting timed out.

#!/bin/bash
#PBS -P kq12
#PBS -q normal
#PBS -l walltime=00:15:00
#PBS -l mem=192GB
#PBS -l jobfs=1GB
#PBS -l ncpus=96

module load  openmpi/4.0.5  

mpirun -np 2 --map-by node /home/659/jj8451/TTG/ttg/build/examples/uts-parsec  -b 2000 -q 0.124875 -m 8 -r 42

When I added --bind-to none the run is complete in 90sec.

#!/bin/bash
#PBS -P kq12
#PBS -q normal
#PBS -l walltime=00:15:00
#PBS -l mem=192GB
#PBS -l jobfs=1GB
#PBS -l ncpus=96

ulimit -c unlimited

module load openmpi/4.0.5
mpirun  -np 2 --map-by node --bind-to none /home/659/jj8451/TTG/ttg/build/examples/uts-parsec  -b 2000 -q 0.124875 -m 8 -r 42

I am running with one mpi process per node. PaRSEC was build normally without any additional features and this external PaRSEC was used to build TTG.

Machine (189GB total)
  Package L#0 + L3 L#0 (36MB)
    Group0 L#0
      NUMANode L#0 (P#0 47GB)
      L2 L#0 (1024KB) + L1d L#0 (32KB) + L1i L#0 (32KB) + Core L#0 + PU L#0 (P#0)
      L2 L#1 (1024KB) + L1d L#1 (32KB) + L1i L#1 (32KB) + Core L#1 + PU L#1 (P#1)
      L2 L#2 (1024KB) + L1d L#2 (32KB) + L1i L#2 (32KB) + Core L#2 + PU L#2 (P#2)
      L2 L#3 (1024KB) + L1d L#3 (32KB) + L1i L#3 (32KB) + Core L#3 + PU L#3 (P#3)
      L2 L#4 (1024KB) + L1d L#4 (32KB) + L1i L#4 (32KB) + Core L#4 + PU L#4 (P#7)
      L2 L#5 (1024KB) + L1d L#5 (32KB) + L1i L#5 (32KB) + Core L#5 + PU L#5 (P#8)
      L2 L#6 (1024KB) + L1d L#6 (32KB) + L1i L#6 (32KB) + Core L#6 + PU L#6 (P#12)
      L2 L#7 (1024KB) + L1d L#7 (32KB) + L1i L#7 (32KB) + Core L#7 + PU L#7 (P#13)
      L2 L#8 (1024KB) + L1d L#8 (32KB) + L1i L#8 (32KB) + Core L#8 + PU L#8 (P#14)
      L2 L#9 (1024KB) + L1d L#9 (32KB) + L1i L#9 (32KB) + Core L#9 + PU L#9 (P#18)
      L2 L#10 (1024KB) + L1d L#10 (32KB) + L1i L#10 (32KB) + Core L#10 + PU L#10 (P#19)
      L2 L#11 (1024KB) + L1d L#11 (32KB) + L1i L#11 (32KB) + Core L#11 + PU L#11 (P#20)
      HostBridge
        PCI 00:11.5 (SATA)
        PCI 00:17.0 (SATA)
          Block(Disk) "sda"
        PCIBridge
          PCIBridge
            PCI 02:00.0 (VGA)
      HostBridge
        PCIBridge
          PCIBridge
            PCIBridge
              PCI 08:00.2 (Ethernet)
                Net "eno1"
    Group0 L#1
      NUMANode L#1 (P#1 47GB)
      L2 L#12 (1024KB) + L1d L#12 (32KB) + L1i L#12 (32KB) + Core L#12 + PU L#12 (P#4)
      L2 L#13 (1024KB) + L1d L#13 (32KB) + L1i L#13 (32KB) + Core L#13 + PU L#13 (P#5)
      L2 L#14 (1024KB) + L1d L#14 (32KB) + L1i L#14 (32KB) + Core L#14 + PU L#14 (P#6)
      L2 L#15 (1024KB) + L1d L#15 (32KB) + L1i L#15 (32KB) + Core L#15 + PU L#15 (P#9)
      L2 L#16 (1024KB) + L1d L#16 (32KB) + L1i L#16 (32KB) + Core L#16 + PU L#16 (P#10)
      L2 L#17 (1024KB) + L1d L#17 (32KB) + L1i L#17 (32KB) + Core L#17 + PU L#17 (P#11)
      L2 L#18 (1024KB) + L1d L#18 (32KB) + L1i L#18 (32KB) + Core L#18 + PU L#18 (P#15)
      L2 L#19 (1024KB) + L1d L#19 (32KB) + L1i L#19 (32KB) + Core L#19 + PU L#19 (P#16)
      L2 L#20 (1024KB) + L1d L#20 (32KB) + L1i L#20 (32KB) + Core L#20 + PU L#20 (P#17)
      L2 L#21 (1024KB) + L1d L#21 (32KB) + L1i L#21 (32KB) + Core L#21 + PU L#21 (P#21)
      L2 L#22 (1024KB) + L1d L#22 (32KB) + L1i L#22 (32KB) + Core L#22 + PU L#22 (P#22)
      L2 L#23 (1024KB) + L1d L#23 (32KB) + L1i L#23 (32KB) + Core L#23 + PU L#23 (P#23)
      HostBridge
        PCIBridge
          PCI 58:00.0 (InfiniBand)
            Net "ib0"
            OpenFabrics "mlx5_0"
  Package L#1 + L3 L#1 (36MB)
    Group0 L#2
      NUMANode L#2 (P#2 47GB)
      L2 L#24 (1024KB) + L1d L#24 (32KB) + L1i L#24 (32KB) + Core L#24 + PU L#24 (P#24)
      L2 L#25 (1024KB) + L1d L#25 (32KB) + L1i L#25 (32KB) + Core L#25 + PU L#25 (P#25)
      L2 L#26 (1024KB) + L1d L#26 (32KB) + L1i L#26 (32KB) + Core L#26 + PU L#26 (P#26)
      L2 L#27 (1024KB) + L1d L#27 (32KB) + L1i L#27 (32KB) + Core L#27 + PU L#27 (P#27)
      L2 L#28 (1024KB) + L1d L#28 (32KB) + L1i L#28 (32KB) + Core L#28 + PU L#28 (P#31)
      L2 L#29 (1024KB) + L1d L#29 (32KB) + L1i L#29 (32KB) + Core L#29 + PU L#29 (P#32)
      L2 L#30 (1024KB) + L1d L#30 (32KB) + L1i L#30 (32KB) + Core L#30 + PU L#30 (P#33)
      L2 L#31 (1024KB) + L1d L#31 (32KB) + L1i L#31 (32KB) + Core L#31 + PU L#31 (P#37)
      L2 L#32 (1024KB) + L1d L#32 (32KB) + L1i L#32 (32KB) + Core L#32 + PU L#32 (P#38)
      L2 L#33 (1024KB) + L1d L#33 (32KB) + L1i L#33 (32KB) + Core L#33 + PU L#33 (P#39)
      L2 L#34 (1024KB) + L1d L#34 (32KB) + L1i L#34 (32KB) + Core L#34 + PU L#34 (P#43)
      L2 L#35 (1024KB) + L1d L#35 (32KB) + L1i L#35 (32KB) + Core L#35 + PU L#35 (P#44)
    Group0 L#3
      NUMANode L#3 (P#3 47GB)
      L2 L#36 (1024KB) + L1d L#36 (32KB) + L1i L#36 (32KB) + Core L#36 + PU L#36 (P#28)
      L2 L#37 (1024KB) + L1d L#37 (32KB) + L1i L#37 (32KB) + Core L#37 + PU L#37 (P#29)
      L2 L#38 (1024KB) + L1d L#38 (32KB) + L1i L#38 (32KB) + Core L#38 + PU L#38 (P#30)
      L2 L#39 (1024KB) + L1d L#39 (32KB) + L1i L#39 (32KB) + Core L#39 + PU L#39 (P#34)
      L2 L#40 (1024KB) + L1d L#40 (32KB) + L1i L#40 (32KB) + Core L#40 + PU L#40 (P#35)
      L2 L#41 (1024KB) + L1d L#41 (32KB) + L1i L#41 (32KB) + Core L#41 + PU L#41 (P#36)
      L2 L#42 (1024KB) + L1d L#42 (32KB) + L1i L#42 (32KB) + Core L#42 + PU L#42 (P#40)
      L2 L#43 (1024KB) + L1d L#43 (32KB) + L1i L#43 (32KB) + Core L#43 + PU L#43 (P#41)
      L2 L#44 (1024KB) + L1d L#44 (32KB) + L1i L#44 (32KB) + Core L#44 + PU L#44 (P#42)
      L2 L#45 (1024KB) + L1d L#45 (32KB) + L1i L#45 (32KB) + Core L#45 + PU L#45 (P#45)
      L2 L#46 (1024KB) + L1d L#46 (32KB) + L1i L#46 (32KB) + Core L#46 + PU L#46 (P#46)
      L2 L#47 (1024KB) + L1d L#47 (32KB) + L1i L#47 (32KB) + Core L#47 + PU L#47 (P#47)
devreal commented 3 years ago

Any chance your PaRSEC wasn't built with support for hwloc? According to the OMPI documentation, the default binding with np<=2 is core and if PaRSEC has no support for hwloc it won't enforce any binding itself.

josephjohnjj commented 3 years ago

PaRSEC was built with hwloc. ldd libparsec.so.3.0.0 gives the following

    linux-vdso.so.1 (0x00007ffcf9587000)
    libdl.so.2 => /lib64/libdl.so.2 (0x00007fca2d397000)
    librt.so.1 => /lib64/librt.so.1 (0x00007fca2d18f000)
    libhwloc.so.15 => /lib64/libhwloc.so.15 (0x00007fca2cf3f000)
    libmpi.so.40 => /apps/openmpi/4.0.5/lib/libmpi.so.40 (0x00007fca2cc18000)
    libimf.so => /apps/intel-ct/2021.2.0/compiler/linux/compiler/lib/intel64/libimf.so (0x00007fca2c590000)
    libsvml.so => /apps/intel-ct/2021.2.0/compiler/linux/compiler/lib/intel64/libsvml.so (0x00007fca2aa93000)
    libirng.so => /apps/intel-ct/2021.2.0/compiler/linux/compiler/lib/intel64/libirng.so (0x00007fca2a729000)
    libm.so.6 => /lib64/libm.so.6 (0x00007fca2a3a7000)
    libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007fca2a18f000)
    libintlc.so.5 => /apps/intel-ct/2021.2.0/compiler/linux/compiler/lib/intel64/libintlc.so.5 (0x00007fca29f17000)
    libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fca29cf7000)
    libc.so.6 => /lib64/libc.so.6 (0x00007fca29932000)
    /lib64/ld-linux-x86-64.so.2 (0x00007fca2d843000)
    libopen-rte.so.40 => /apps/openmpi-mofed5.1-pbs2021.1/4.0.5/lib/libopen-rte.so.40 (0x00007fca2967c000)
    libopen-pal.so.40 => /apps/openmpi-mofed5.1-pbs2021.1/4.0.5/lib/libopen-pal.so.40 (0x00007fca29371000)
    libudev.so.1 => /lib64/libudev.so.1 (0x00007fca290db000)
    libpciaccess.so.0 => /lib64/libpciaccess.so.0 (0x00007fca28ed1000)
    libutil.so.1 => /lib64/libutil.so.1 (0x00007fca28ccd000)
    libz.so.1 => /lib64/libz.so.1 (0x00007fca28ab6000)
    libmount.so.1 => /lib64/libmount.so.1 (0x00007fca2885c000)
    libblkid.so.1 => /lib64/libblkid.so.1 (0x00007fca28609000)
    libuuid.so.1 => /lib64/libuuid.so.1 (0x00007fca28401000)
    libselinux.so.1 => /lib64/libselinux.so.1 (0x00007fca281d7000)
    libpcre2-8.so.0 => /lib64/libpcre2-8.so.0 (0x00007fca27f53000)
josephjohnjj commented 3 years ago

This is the error generated by PaRSEC

^[[1;37;43mW@00000^[[0m binding core #2000 not valid (must be between 0 and 47 (nb_core-1) ^[[1;37;43mW@00002^[[0m binding core #2000 not valid (must be between 0 and 47 (nb_core-1) ^[[1;37;43mW@00005^[[0m binding core #2000 not valid (must be between 0 and 47 (nb_core-1) ^[[1;37;43mW@00004^[[0m binding core #2000 not valid (must be between 0 and 47 (nb_core-1) ^[[1;37;43mW@00003^[[0m binding core #2000 not valid (must be between 0 and 47 (nb_core-1) ^[[1;37;43mW@00001^[[0m binding core #2000 not valid (must be between 0 and 47 (nb_core-1) ^[[1;37;43mW@00007^[[0m binding core #2000 not valid (must be between 0 and 47 (nb_core-1) ^[[1;37;43mW@00006^[[0m binding core #2000 not valid (must be between 0 and 47 (nb_core-1) ^[[1;37;43mW@00000^[[0m Couldn't bind to cpuset 0x0 ^[[1;37;43mW@00007^[[0m Couldn't bind to cpuset 0x0 ^[[1;37;43mW@00005^[[0m Couldn't bind to cpuset 0x0 ^[[1;37;43mW@00006^[[0m Couldn't bind to cpuset 0x0 ^[[1;37;43mW@00003^[[0m Couldn't bind to cpuset 0x0 ^[[1;37;43mW@00002^[[0m Couldn't bind to cpuset 0x0 ^[[1;37;43mW@00004^[[0m Couldn't bind to cpuset 0x0 ^[[1;37;43mW@00001^[[0m Couldn't bind to cpuset 0x0