Open nitbhat opened 4 years ago
Adding the following line to the end of ConverseCommonInit shows that CmiRankOf and CmiNodeOf return incorrect values.
ConverseCommonInit
CmiRankOf
CmiNodeOf
Line to add: CmiPrintf("[PE:%d][Node:%d][Rank:%d] ConverseCommonInit CmiMyNodeSize()=%d, CmiNodeOf(CmiMyPe()) = %d, CmiRankOf(CmiMyPe()) =%d\n", CmiMyPe(), CmiMyNode(), CmiMyRank(), CmiMyNodeSize(), CmiNodeOf(CmiMyPe()), CmiRankOf(CmiMyPe()));
CmiPrintf("[PE:%d][Node:%d][Rank:%d] ConverseCommonInit CmiMyNodeSize()=%d, CmiNodeOf(CmiMyPe()) = %d, CmiRankOf(CmiMyPe()) =%d\n", CmiMyPe(), CmiMyNode(), CmiMyRank(), CmiMyNodeSize(), CmiNodeOf(CmiMyPe()), CmiRankOf(CmiMyPe()));
On running tests/charm++/simplearrayhello with an smp build (like mpi-smp), you can see:
tests/charm++/simplearrayhello
mpi-smp
nbhat4@courage:/scratch/nitin/charm_2/tests/charm++/simplearrayhello$ make test ../../../bin/charmc hello.ci ../../../bin/charmc -c hello.C ../../../bin/charmc -language charm++ -o hello hello.o ../../../bin/testrun ./hello +p4 10 Running on 4 processors: ./hello 10 charmrun> /usr/bin/setarch x86_64 -R mpirun -np 4 ./hello 10 Charm++> Running on MPI version: 3.0 Charm++> level of thread support used: MPI_THREAD_FUNNELED (desired: MPI_THREAD_FUNNELED) Charm++> Running in SMP mode: 4 processes, 1 worker threads (PEs) + 1 comm threads per process, 4 PEs total Charm++> The comm. thread both sends and receives messages Converse/Charm++ Commit ID: v6.11.0-devel-211-g6704419f9 Isomalloc> Synchronized global address space. [PE:3][Node:3][Rank:0] ConverseCommonInit CmiMyNodeSize()=1, CmiNodeOf(CmiMyPe()) = 3, CmiRankOf(CmiMyPe()) =0 [PE:2][Node:2][Rank:0] ConverseCommonInit CmiMyNodeSize()=1, CmiNodeOf(CmiMyPe()) = 2, CmiRankOf(CmiMyPe()) =0 [PE:0][Node:0][Rank:0] ConverseCommonInit CmiMyNodeSize()=1, CmiNodeOf(CmiMyPe()) = 0, CmiRankOf(CmiMyPe()) =0 [PE:7][Node:3][Rank:1] ConverseCommonInit CmiMyNodeSize()=1, CmiNodeOf(CmiMyPe()) = 7, CmiRankOf(CmiMyPe()) =0 [PE:4][Node:0][Rank:1] ConverseCommonInit CmiMyNodeSize()=1, CmiNodeOf(CmiMyPe()) = 4, CmiRankOf(CmiMyPe()) =0 CharmLB> Load balancer assumes all CPUs are same. [PE:1][Node:1][Rank:0] ConverseCommonInit CmiMyNodeSize()=1, CmiNodeOf(CmiMyPe()) = 1, CmiRankOf(CmiMyPe()) =0 [PE:5][Node:1][Rank:1] ConverseCommonInit CmiMyNodeSize()=1, CmiNodeOf(CmiMyPe()) = 5, CmiRankOf(CmiMyPe()) =0 [PE:6][Node:2][Rank:1] ConverseCommonInit CmiMyNodeSize()=1, CmiNodeOf(CmiMyPe()) = 6, CmiRankOf(CmiMyPe()) =0 Charm++> Running on 1 hosts (1 sockets x 4 cores x 2 PUs = 8-way SMP) Charm++> cpu topology info is gathered in 0.128 seconds.
nbhat4@courage:/scratch/nitin/charm_2/tests/charm++/simplearrayhello$ make test TESTOPTS="++ppn 2" ../../../bin/testrun ./hello +p4 10 ++ppn 2 Running on 2 processors: ./hello 10 +ppn 2 charmrun> /usr/bin/setarch x86_64 -R mpirun -np 2 ./hello 10 +ppn 2 Charm++> Running on MPI version: 3.0 Charm++> level of thread support used: MPI_THREAD_FUNNELED (desired: MPI_THREAD_FUNNELED) Charm++> Running in SMP mode: 2 processes, 2 worker threads (PEs) + 1 comm threads per process, 4 PEs total Charm++> The comm. thread both sends and receives messages Converse/Charm++ Commit ID: v6.11.0-devel-211-g6704419f9 Isomalloc> Synchronized global address space. [PE:0][Node:0][Rank:0] ConverseCommonInit CmiMyNodeSize()=2, CmiNodeOf(CmiMyPe()) = 0, CmiRankOf(CmiMyPe()) =0 [PE:3][Node:1][Rank:1] ConverseCommonInit CmiMyNodeSize()=2, CmiNodeOf(CmiMyPe()) = 1, CmiRankOf(CmiMyPe()) =1 [PE:2][Node:1][Rank:0] ConverseCommonInit CmiMyNodeSize()=2, CmiNodeOf(CmiMyPe()) = 1, CmiRankOf(CmiMyPe()) =0 [PE:5][Node:1][Rank:2] ConverseCommonInit CmiMyNodeSize()=2, CmiNodeOf(CmiMyPe()) = 2, CmiRankOf(CmiMyPe()) =1 [PE:1][Node:0][Rank:1] ConverseCommonInit CmiMyNodeSize()=2, CmiNodeOf(CmiMyPe()) = 0, CmiRankOf(CmiMyPe()) =1 [PE:4][Node:0][Rank:2] ConverseCommonInit CmiMyNodeSize()=2, CmiNodeOf(CmiMyPe()) = 2, CmiRankOf(CmiMyPe()) =0 CharmLB> Load balancer assumes all CPUs are same. Charm++> Running on 1 hosts (1 sockets x 4 cores x 2 PUs = 8-way SMP) Charm++> cpu topology info is gathered in 0.087 seconds.
Do you only see this on MPI builds?
I also saw it on UCX builds as well IIRC, on Frontera. (that's the reason I tested this on my local machine and saw the incorrect values with mpi-smp on courage).
courage
Adding the following line to the end of
ConverseCommonInit
shows thatCmiRankOf
andCmiNodeOf
return incorrect values.Line to add:
CmiPrintf("[PE:%d][Node:%d][Rank:%d] ConverseCommonInit CmiMyNodeSize()=%d, CmiNodeOf(CmiMyPe()) = %d, CmiRankOf(CmiMyPe()) =%d\n", CmiMyPe(), CmiMyNode(), CmiMyRank(), CmiMyNodeSize(), CmiNodeOf(CmiMyPe()), CmiRankOf(CmiMyPe()));
On running
tests/charm++/simplearrayhello
with an smp build (likempi-smp
), you can see: