STEllAR-GROUP / hpx

The C++ Standard Library for Parallelism and Concurrency
https://hpx.stellar-group.org
Boost Software License 1.0
2.53k stars 435 forks source link

--hpx:bind throws unexpected error #1370

Closed hkaiser closed 9 years ago

hkaiser commented 9 years ago

From hpx-users:

I’m trying to bind threads manually for the Xeon phi on hermione. I am able to use the —hpx:bind=balance or compact or scatter, but none of those will let me run with a specified number of cores and the same number of threads on each core (which would be a nice enhancement)

So I tired binding manually and the example in the manual fails:

$ ./hello_world -t4 --hpx:bind=thread:0-3=core:0-3.pu:0
{env}: 13 entries:
  HOME=/home/pagrubel
  LD_LIBRARY_PATH=/opt/intel/composer_xe_2013/lib/mic:/opt/hwloc/1.7-k10m-release/lib
  LOGNAME=pagrubel
  MAIL=/var/mail/pagrubel
  OLDPWD=/home/pagrubel
  PATH=/usr/bin:/bin:/usr/sbin:/sbin
  PWD=/home/pagrubel/build/hpx_buildmic/bin
  SHELL=/bin/sh
  SSH_CLIENT=172.31.1.254 41705 22
  SSH_CONNECTION=172.31.1.254 41705 172.31.1.1 22
  SSH_TTY=/dev/pts/0
  TERM=xterm-256color
  USER=pagrubel
hpx::init: std::exception caught: The number of OS threads requested (4) does not 
    match the number of threads to bind (3): HPX(bad_parameter)
hkaiser commented 9 years ago

I'm not able to reproduce this. For me (granted, not on a MIC) this command line, with an --hpx:print-bind added, produces (as expected):

*******************************************************************************
locality: 0
   0: PU L#0(P#0), Core L#0, Socket L#0, Node L#0(P#0)
   1: PU L#2(P#2), Core L#1, Socket L#0, Node L#0(P#0)
   2: PU L#4(P#4), Core L#2, Socket L#0, Node L#0(P#0)
   3: PU L#6(P#6), Core L#3, Socket L#0, Node L#0(P#0)
hello world from OS-thread 2 on locality 0
hello world from OS-thread 0 on locality 0
hello world from OS-thread 1 on locality 0
hello world from OS-thread 3 on locality 0
pagrubel commented 9 years ago

Yes it worked for me on the Ivy Bridge node that had hyper threading turned on too, but not on the Xeon phi

sithhell commented 9 years ago

The bind specification only specifies 3 threads, but 4 worker threads are requested.

pagrubel commented 9 years ago

0-3 is four and the exact same command works on other machines

sithhell commented 9 years ago

Am 03.02.2015 18:08 schrieb "Patricia Grubel" notifications@github.com:

0-3 is four

You are of course right. Sorry for the noise...

sithhell commented 9 years ago

This is related to #1254.

hkaiser commented 9 years ago

This is related to #1254.

How so?

hkaiser commented 9 years ago

When building with HPX_MAX_CPU_COUNT=256 this is still not reproducible. AFAICT this is the only setting which could possibly cause the issue.

hkaiser commented 9 years ago

@pagrubel: do you have HWLOC enabled on the Phi? Does the --hpx:print-bind option produce any output before the error is raised?

pagrubel commented 9 years ago

Yes I have hwloc enabled No output from --hpx:print-bind before the error is raised

hkaiser commented 9 years ago

@pagrubel: could you verify whether this breaks as well, please:

./hello_world -t4 --hpx:bind=thread:0-3=core:1-4.pu:0

I would like to make sure that the special handling of core 0 on the Phi does not get in the way here.

pagrubel commented 9 years ago

same error