charmplusplus / charm

The Charm++ parallel programming system. Visit https://charmplusplus.org/ for more information.
Apache License 2.0
200 stars 50 forks source link

Incorrect values returned by CmiRankOf and CmiNodeOf on the comm thread #2828

Open nitbhat opened 4 years ago

nitbhat commented 4 years ago

Adding the following line to the end of ConverseCommonInit shows that CmiRankOf and CmiNodeOf return incorrect values.

Line to add: CmiPrintf("[PE:%d][Node:%d][Rank:%d] ConverseCommonInit CmiMyNodeSize()=%d, CmiNodeOf(CmiMyPe()) = %d, CmiRankOf(CmiMyPe()) =%d\n", CmiMyPe(), CmiMyNode(), CmiMyRank(), CmiMyNodeSize(), CmiNodeOf(CmiMyPe()), CmiRankOf(CmiMyPe()));

On running tests/charm++/simplearrayhello with an smp build (like mpi-smp), you can see:


nbhat4@courage:/scratch/nitin/charm_2/tests/charm++/simplearrayhello$ make test
../../../bin/charmc   hello.ci
../../../bin/charmc  -c hello.C
../../../bin/charmc  -language charm++ -o hello hello.o
../../../bin/testrun  ./hello +p4 10

Running on 4 processors:  ./hello 10
charmrun>  /usr/bin/setarch x86_64 -R  mpirun -np 4  ./hello 10
Charm++> Running on MPI version: 3.0
Charm++> level of thread support used: MPI_THREAD_FUNNELED (desired: MPI_THREAD_FUNNELED)
Charm++> Running in SMP mode: 4 processes, 1 worker threads (PEs) + 1 comm threads per process, 4 PEs total
Charm++> The comm. thread both sends and receives messages
Converse/Charm++ Commit ID: v6.11.0-devel-211-g6704419f9
Isomalloc> Synchronized global address space.
[PE:3][Node:3][Rank:0] ConverseCommonInit CmiMyNodeSize()=1, CmiNodeOf(CmiMyPe()) = 3, CmiRankOf(CmiMyPe()) =0
[PE:2][Node:2][Rank:0] ConverseCommonInit CmiMyNodeSize()=1, CmiNodeOf(CmiMyPe()) = 2, CmiRankOf(CmiMyPe()) =0
[PE:0][Node:0][Rank:0] ConverseCommonInit CmiMyNodeSize()=1, CmiNodeOf(CmiMyPe()) = 0, CmiRankOf(CmiMyPe()) =0
[PE:7][Node:3][Rank:1] ConverseCommonInit CmiMyNodeSize()=1, CmiNodeOf(CmiMyPe()) = 7, CmiRankOf(CmiMyPe()) =0
[PE:4][Node:0][Rank:1] ConverseCommonInit CmiMyNodeSize()=1, CmiNodeOf(CmiMyPe()) = 4, CmiRankOf(CmiMyPe()) =0
CharmLB> Load balancer assumes all CPUs are same.
[PE:1][Node:1][Rank:0] ConverseCommonInit CmiMyNodeSize()=1, CmiNodeOf(CmiMyPe()) = 1, CmiRankOf(CmiMyPe()) =0
[PE:5][Node:1][Rank:1] ConverseCommonInit CmiMyNodeSize()=1, CmiNodeOf(CmiMyPe()) = 5, CmiRankOf(CmiMyPe()) =0
[PE:6][Node:2][Rank:1] ConverseCommonInit CmiMyNodeSize()=1, CmiNodeOf(CmiMyPe()) = 6, CmiRankOf(CmiMyPe()) =0
Charm++> Running on 1 hosts (1 sockets x 4 cores x 2 PUs = 8-way SMP)
Charm++> cpu topology info is gathered in 0.128 seconds.

nbhat4@courage:/scratch/nitin/charm_2/tests/charm++/simplearrayhello$ make test TESTOPTS="++ppn 2"
../../../bin/testrun  ./hello +p4 10  ++ppn 2

Running on 2 processors:  ./hello 10 +ppn 2
charmrun>  /usr/bin/setarch x86_64 -R  mpirun -np 2  ./hello 10 +ppn 2
Charm++> Running on MPI version: 3.0
Charm++> level of thread support used: MPI_THREAD_FUNNELED (desired: MPI_THREAD_FUNNELED)
Charm++> Running in SMP mode: 2 processes, 2 worker threads (PEs) + 1 comm threads per process, 4 PEs total
Charm++> The comm. thread both sends and receives messages
Converse/Charm++ Commit ID: v6.11.0-devel-211-g6704419f9
Isomalloc> Synchronized global address space.
[PE:0][Node:0][Rank:0] ConverseCommonInit CmiMyNodeSize()=2, CmiNodeOf(CmiMyPe()) = 0, CmiRankOf(CmiMyPe()) =0
[PE:3][Node:1][Rank:1] ConverseCommonInit CmiMyNodeSize()=2, CmiNodeOf(CmiMyPe()) = 1, CmiRankOf(CmiMyPe()) =1
[PE:2][Node:1][Rank:0] ConverseCommonInit CmiMyNodeSize()=2, CmiNodeOf(CmiMyPe()) = 1, CmiRankOf(CmiMyPe()) =0
[PE:5][Node:1][Rank:2] ConverseCommonInit CmiMyNodeSize()=2, CmiNodeOf(CmiMyPe()) = 2, CmiRankOf(CmiMyPe()) =1
[PE:1][Node:0][Rank:1] ConverseCommonInit CmiMyNodeSize()=2, CmiNodeOf(CmiMyPe()) = 0, CmiRankOf(CmiMyPe()) =1
[PE:4][Node:0][Rank:2] ConverseCommonInit CmiMyNodeSize()=2, CmiNodeOf(CmiMyPe()) = 2, CmiRankOf(CmiMyPe()) =0
CharmLB> Load balancer assumes all CPUs are same.
Charm++> Running on 1 hosts (1 sockets x 4 cores x 2 PUs = 8-way SMP)
Charm++> cpu topology info is gathered in 0.087 seconds.
rbuch commented 4 years ago

Do you only see this on MPI builds?

nitbhat commented 4 years ago

I also saw it on UCX builds as well IIRC, on Frontera. (that's the reason I tested this on my local machine and saw the incorrect values with mpi-smp on courage).