GlobalArrays / ga

Partitioned Global Address Space (PGAS) library for distributed arrays
http://hpc.pnl.gov/globalarrays/
Other
99 stars 38 forks source link

The test testing/perf fails: Assertion failed: (nproc == 2), function main, file testing/perf.c, line 42. #312

Closed yurivict closed 1 year ago

yurivict commented 1 year ago
FAIL: testing/perf
Assertion failed: (nproc == 2), function main, file testing/perf.c, line 42.

Version: 5.8.2 clang-15 FreeBSD 13.2

jeffhammond commented 1 year ago

how did you run the test? did you run the test with 2 processes, as required (or 3, if using MPI-PR)?

yurivict commented 1 year ago

Tests are run with the make check command.

jeffhammond commented 1 year ago

what was MPIEXEC set to when you ran make check? can you provide the total output, including the log files?

yurivict commented 1 year ago

MPIEXEC=/usr/local/bin/mpiexec which is mpich-3.4.3

Output:

gmake  check-TESTS
gmake[5]: Entering directory '/usr/ports/devel/ga/work/ga-5.8.2/comex'
gmake[6]: Entering directory '/usr/ports/devel/ga/work/ga-5.8.2/comex'
FAIL: testing/perf
Assertion failed: (nproc == 2), function main, file testing/perf.c, line 42.
gmake[6]: *** [Makefile:1775: testing/perf.log] Abort trap

What log files can I provide?

jeffhammond commented 1 year ago
./comex/testing/perf.log
./comex/config.log
./armci/config.log
./config.log

I cannot reproduce on MacOS with ../configure MPICC=mpicc MPICXX=mpicxx MPIF77=mpifort --with-mpi-pr && make -j8 && make -j8 checkprogs && make check using MPICH 4.1.2. I know BSD is different, but not in ways that should matter here.

yurivict commented 1 year ago

comex/testing/perf.log file doesn't exist.

comex/config.log armci/config.log config.log

jeffhammond commented 1 year ago

okay, can you try running it manually with /usr/local/bin/mpiexec -n 2 ./comex/testing/perf.x or similar?

yurivict commented 1 year ago

There's no file ./comex/testing/perf.x.

But /usr/local/bin/mpiexec -n 2 ./comex/testing/perf runs through successfully.

edoapra commented 1 year ago

There's no file ./comex/testing/perf.x.

But /usr/local/bin/mpiexec -n 2 ./comex/testing/perf runs through successfully.

That happens because the test runs only with nproc=2 as the original error clearly states
https://github.com/GlobalArrays/ga/issues/312#issuecomment-1651161830 Assertion failed: (nproc == 2), function main, file testing/perf.c, line 42.
https://github.com/GlobalArrays/ga/blob/56087b52459e311e49fc05d9708329a39b776549/comex/testing/perf.c#L42

In other words, you need to set MPIEXEC="mpiexec -np 2"

yurivict commented 1 year ago

@edoapra

The problem is that setting MPIEXEC="mpiexec -np 2" from outside of build or test (make check) commands doesn't change anything. This is a problem in the project's makefiles that MPIEXEC isn't used.

In fact, comex/Makefile already defines MPIEXEC = /usr/local/bin/mpirun -n %NP% but it somehow doesn't work.

I only call make check from the port, and it always fails.

edoapra commented 1 year ago

@yurivict I am assuming this is what you are getting

$ uname -a
FreeBSD freebsd 13.2-RELEASE FreeBSD 13.2-RELEASE releng/13.2-n254617-525ecfdad597 GENERIC amd64
[edo@freebsd ~/ga-build]$ make check 
make  check-recursive
Making check in comex
make  check-am
make     testing/perf  testing/perf_amo testing/perf_contig  testing/perf_strided testing/shift  testing/test
`testing/perf' is up to date.
`testing/perf_amo' is up to date.
`testing/perf_contig' is up to date.
`testing/perf_strided' is up to date.
`testing/shift' is up to date.
`testing/test' is up to date.
make  check-TESTS
FAIL: testing/perf
^C*** Error code 130
*** Signal 2
*** Signal 2
*** Signal 2
*** Signal 2
*** Signal 2

[edo@freebsd ~/ga-build]$ tail comex/testing/perf.log 

Assertion failed: (nproc == 2), function main, file ../../ga/comex/testing/perf.c, line 42.
Assertion failed: (nproc == 2), function main, file ../../ga/comex/testing/perf.c, line 42.
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 0 on node freebsd exited on signal 6 (Abort trap).
--------------------------------------------------------------------------
edoapra commented 1 year ago

@yurivict Look at what happens when I specify make check MPIEXEC='/usr/local/bin/mpirun -np 2'

[edo@freebsd ~/ga-build]$ uname -a
FreeBSD freebsd 13.2-RELEASE FreeBSD 13.2-RELEASE releng/13.2-n254617-525ecfdad597 GENERIC amd64
[edo@freebsd ~/ga-build]$ make check  MPIEXEC='/usr/local/bin/mpirun -np 2'
make  check-recursive
Making check in comex
make  check-am
make     testing/perf  testing/perf_amo testing/perf_contig  testing/perf_strided testing/shift  testing/test
`testing/perf' is up to date.
`testing/perf_amo' is up to date.
`testing/perf_contig' is up to date.
`testing/perf_strided' is up to date.
`testing/shift' is up to date.
`testing/test' is up to date.
make  check-TESTS
PASS: testing/perf
[edo@freebsd ~/ga-build]$ head comex/testing/perf.log 
PASS: testing/perf (exit: 0)
============================

msg size (bytes)     avg time (us)    avg b/w (MB/sec)
#PNNL comex Put Test
16      385.600000      0.041494
32      388.000000      0.082474
64      387.310000      0.165242
128     386.970000      0.330775
256     411.100000      0.622720
yurivict commented 1 year ago

@edoapra

I run from the port's directory.

All the same command line arguments are set there now, but the tests fail:

cd /usr/ports/devel/ga && make test

(to check out the ports tree: sudo git clone https://git.FreeBSD.org/ports.git /usr/ports)

edoapra commented 1 year ago

Could you post the actual log of the command cd /usr/ports/devel/ga && make test?

yurivict commented 1 year ago

log

edoapra commented 1 year ago

It does not look right to me (unless I am not reading the log correctly) MPIEXEC='/usr/local/bin/mpirun -np 2' needs to be an argument to gmake, not an environment variable.

gmake MPIEXEC='/usr/local/bin/mpirun -np 2'
yurivict commented 1 year ago

MPIEXEC='/usr/local/bin/mpirun -np 2' is in TEST_ARGS which is arguments to gmake.

edoapra commented 1 year ago
$ cd /usr/ports/devel/ga

edit Makefile, since I don't want to redo autoreconf

$ diff -u Makefile.org  Makefile
--- Makefile.org    2023-07-27 11:01:11.978246000 -0700
+++ Makefile    2023-07-27 10:57:33.609524000 -0700
@@ -13,7 +13,8 @@
        liblapack.so:math/lapack \
        libscalapack.so:math/scalapack

-USES=      autoreconf fortran gmake libtool localbase
+#USES=     autoreconf fortran gmake libtool localbase
+USES=      fortran gmake
 USE_LDCONFIG=  yes

 GNU_CONFIGURE= yes

Run make and the make test

$ make
$ make test

The output of ps wwww looks promising

$ ps wwww|grep gmak
58458  1  T    0:00.06 gmake -f Makefile MPIEXEC=/usr/local/bin/mpiexec -np 2 check
57558  1  T    0:00.05 gmake check-recursive
57563  1  T    0:00.01 gmake check
57564  1  T    0:00.01 gmake check-am
58281  1  T    0:00.01 gmake check-TESTS
58284  1  T    0:00.01 gmake test-suite.log TEST_LOGS=testing/perf.log testing/perf_contig.log testing/perf_strided.log testing/perf_amo.log testing/shift.log testing/test.log

Here is the log for make test

gmake  check-TESTS
gmake[5]: Entering directory '/usr/ports/devel/ga/work/ga-5.8.2/comex'
gmake[6]: Entering directory '/usr/ports/devel/ga/work/ga-5.8.2/comex'
PASS: testing/perf
PASS: testing/perf_contig

Bottom line for me: please use the vanilla 5.8.2 tarball