ICLDisco / parsec

PaRSEC is a generic framework for architecture aware scheduling and management of micro-tasks on distributed, GPU accelerated, many-core heterogeneous architectures. PaRSEC assigns computation threads to the cores, GPU accelerators, overlaps communications and computations and uses a dynamic, fully-distributed scheduler based on architectural features such as NUMA nodes and algorithmic features such as data reuse.
Other
47 stars 17 forks source link

Thread binding / reporting #173

Open abouteiller opened 6 years ago

abouteiller commented 6 years ago

Original report by Thomas Herault (Bitbucket: herault, GitHub: therault).


Recent work on the BlueGene/Q system showed that the binding capability is limited.

The proposition is to extend the code by changing the type of bindto parameter. Multiple approaches:

abouteiller commented 5 years ago

Original comment by Thomas Herault (Bitbucket: herault, GitHub: therault).


Will check if a new hwloc version would solve the problem.

abouteiller commented 5 years ago

Original comment by Thomas Herault (Bitbucket: herault, GitHub: therault).


Tried with hwloc cross-compiled to support bluegene/Q (see https://www.open-mpi.org/projects/hwloc/doc/v1.11.6/a00305.php#faq_bgq): problem still persists. Investigating.

W@00000 Oversubscription on core 0 detected
W@00001 Oversubscription on core 0 detected
W@00001 Couldn't bind to cpuset  0x0000000c
W@00002 Oversubscription on core 0 detected
W@00003 Oversubscription on core 0 detected
W@00002 Couldn't bind to cpuset  0x00000030
W@00003 Couldn't bind to cpuset  0x000000c0
W@00001 parsec_hwloc: couldn't bind to cpuset 0x0000000c
W@00000 parsec_hwloc: couldn't bind to cpuset 0x00000003
W@00001 Core binding on node -1 failed
W@00000 Core binding on node -1 failed
abouteiller commented 2 years ago

BG/Q systems have been or are being decommissioned and modern systems have no problem using hwloc. Should we close?