PASSIONLab / CombBLAS

The Combinatorial BLAS (CombBLAS) is an extensible distributed-memory parallel graph library offering a small but powerful set of linear algebra primitives specifically targeting graph analytics.
Other
59 stars 20 forks source link

Four tests fail #19

Open drew-parsons opened 1 year ago

drew-parsons commented 1 year ago

Building the CombBLAS 2.0 release on Linux (Debian unstable) with OpenMPI 4.1.5, 4 out of 20 fail, running via ctest (/usr/bin/ctest --force-new-ctest-process --verbose -j8),

80% tests passed, 4 tests failed out of 20

Total Test time (real) = 970.11 sec

The following tests FAILED:
          8 - Indexing_Test (Failed)
          9 - SpAsgn_Test (Failed)
         15 - FBFS_Test (Failed)
         16 - FMIS_Test (Failed)

These are different to the failing tests in #15 , which seems to be driven by missing files.

The failing test output is,

Indexing_Test:

test 8
      Start  8: Indexing_Test

8: Test command: /usr/bin/mpiexec "-n" "4" "/projects/mathlibs/build/combblas/obj-x86_64-linux-gnu/ReleaseTests/IndexingTest" "../TESTDATA" "B_100x100.txt" "B_10x30_Indexed.txt" "rand10outta100.txt" "rand30outta100.txt"
8: Working Directory: /projects/mathlibs/build/combblas/obj-x86_64-linux-gnu/ReleaseTests
8: Test timeout computed to be: 1500
8: Indexing working correctly
8: Elements stored on proc 0: {(0,0.234), (1,0.829), (2,0.221), (3,0.454), (4,0.096), (5,0.399), (6,0.895), (7,0.156), (8,0.052), (9,0.709), (10,0.305), (11,0.669), (12,0.493), (13,0.619), (14,0.736), (15,0.615), (16,0.124), (17,0.831), (18,0.958), (19,0.284), (20,0.411), (21,0.473
8: Elements stored on proc 1: {(0,0.196), (1,0.571), (2,0.482), (3,0.09), (4,0.79), (5,0.939), (6,0.684), (7,0.465), (8,0.236), (9,0.713), (10,0.32), (11,0.748), (12,0.771), (13,0.123), (14,0.79), (15,0.06), (16,0.82), (17,0.506), (18,0.859), (19,0.268), (20,0.49), (21,0.01), (22,0
8: Elements stored on proc 2: {(0,0.159), (1,0.811), (2,0.198), (3,0.163), (4,0.779), (5,0.241), (6,0.623), (7,0.955), (8,0.258), (9,0.861), (10,0.104), (11,0.381), (12,0.657), (13,0.356), (14,0.083), (15,0.712), (16,0.413), (17,0.488), (18,0.646), (19,0.99), (20,0.523), (21,0.034)
8: Elements stored on proc 3: {(0,0.609), (1,0.557), (2,0.926), (3,0.481), (4,0.218), (5,0.92), (6,0.049), (7,0.052), (8,0.424), (9,0.214), (10,0.606), (11,0.385), (12,0.848), (13,0.583), (14,0.586), (15,0.615), (16,0.797), (17,0.48), (18,0.378), (19,0.66), (20,0.169), (21,0.258),
8: COMBBLAS Warning: It is dangerous to create (vector) objects without specifying the communicator, are you sure you want to create this object in MPI_COMM_WORLD?
8: COMBBLAS Warning: It is dangerous to create (vector) objects without specifying the communicator, are you sure you want to create this object in MPI_COMM_WORLD?
8: [sandy:2061479] *** An error occurred in MPI_Alltoallv
8: [sandy:2061479] *** reported by process [2022768641,1]
8: [sandy:2061479] *** on communicator MPI COMMUNICATOR 3 DUP FROM 0
8: [sandy:2061479] *** MPI_ERR_TRUNCATE: message truncated
8: [sandy:2061479] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
8: [sandy:2061479] ***    and potentially your MPI job)
8: [sandy:2061523] *** Process received signal ***
8: [sandy:2061523] Signal: Segmentation fault (11)
8: [sandy:2061523] Signal code: Address not mapped (1)
8: [sandy:2061523] Failing at address: 0x78a63cc0
8: [sandy:2061523] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x3c510)[0x7fa666c5a510]
8: [sandy:2061523] [ 1] /lib/x86_64-linux-gnu/libc.so.6(+0x152449)[0x7fa666d70449]
8: [sandy:2061523] [ 2] /usr/lib/x86_64-linux-gnu/libopen-pal.so.40(opal_convertor_pack+0xaf)[0x7fa6670ae0df]
8: [sandy:2061523] [ 3] /usr/lib/x86_64-linux-gnu/libmpi.so.40(ompi_datatype_sndrcv+0x1fe)[0x7fa66729055e]
8: [sandy:2061523] [ 4] /usr/lib/x86_64-linux-gnu/libmpi.so.40(ompi_coll_base_alltoallv_intra_basic_linear+0x2bf)[0x7fa6672decaf]
8: [sandy:2061523] [ 5] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_tuned.so(ompi_coll_tuned_alltoallv_intra_dec_fixed+0x42)[0x7fa6649e6fa2]
8: [sandy:2061523] [ 6] /usr/lib/x86_64-linux-gnu/libmpi.so.40(PMPI_Alltoallv+0x1b5)[0x7fa667293315]
8: [sandy:2061523] [ 7] /projects/mathlibs/build/combblas/obj-x86_64-linux-gnu/ReleaseTests/IndexingTest(+0x1b151)[0x562012e18151]
8: [sandy:2061523] [ 8] /projects/mathlibs/build/combblas/obj-x86_64-linux-gnu/ReleaseTests/IndexingTest(_ZN8combblas11SpParHelper13KeyValuePSortIdiiEESt6vectorISt4pairIT_T0_ESaIS6_EEPS6_T1_PSA_RKP19ompi_communicator_t+0x3bb)[0x562012e3ac7b]
8: [sandy:2061523] [ 9] /projects/mathlibs/build/combblas/obj-x86_64-linux-gnu/ReleaseTests/IndexingTest(_ZN8combblas14FullyDistSpVecIidE4sortEv+0x284)[0x562012e3af34]
8: [sandy:2061523] [10] /projects/mathlibs/build/combblas/obj-x86_64-linux-gnu/ReleaseTests/IndexingTest(_Z4TopKIidESt4pairIN8combblas12FullyDistVecIT_S3_EENS2_IS3_T0_EEERNS1_14FullyDistSpVecIS3_S5_EES3_+0x247)[0x562012e3b3a7]
8: [sandy:2061523] [11] /projects/mathlibs/build/combblas/obj-x86_64-linux-gnu/ReleaseTests/IndexingTest(main+0xa2c)[0x562012e12ebc]
8: [sandy:2061523] [12] /lib/x86_64-linux-gnu/libc.so.6(+0x276ca)[0x7fa666c456ca]
8: [sandy:2061523] [13] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x85)[0x7fa666c45785]
8: [sandy:2061523] [14] /projects/mathlibs/build/combblas/obj-x86_64-linux-gnu/ReleaseTests/IndexingTest(_start+0x21)[0x562012e13981]
8: [sandy:2061523] *** End of error message ***
8: [sandy:2061354] 2 more processes have sent help message help-mpi-errors.txt / mpi_errors_are_fatal
8: [sandy:2061354] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
 3/20 Test  #8: Indexing_Test ....................***Failed   20.56 sec

SpAsgn_Test:

test 9
      Start  9: SpAsgn_Test

9: Test command: /usr/bin/mpiexec "-n" "4" "/projects/mathlibs/build/combblas/obj-x86_64-linux-gnu/ReleaseTests/SpAsgnTest" "../TESTDATA" "A_100x100.txt" "A_with20x30hole.txt" "dense_20x30matrix.txt" "A_wdenseblocks.txt" "20outta100.txt" "30outta100.txt"
9: Working Directory: /projects/mathlibs/build/combblas/obj-x86_64-linux-gnu/ReleaseTests
9: Test timeout computed to be: 1500
9: Pruning is working
9: SpAsgn working correctly
9: COMBBLAS Warning: It is dangerous to create (vector) objects without specifying the communicator, are you sure you want to create this object in MPI_COMM_WORLD?
9: COMBBLAS Warning: It is dangerous to create (vector) objects without specifying the communicator, are you sure you want to create this object in MPI_COMM_WORLD?
9: [sandy:2061949] *** An error occurred in MPI_Alltoallv
9: [sandy:2061949] *** reported by process [2060517377,0]
9: [sandy:2061949] *** on communicator MPI COMMUNICATOR 3 DUP FROM 0
9: [sandy:2061949] *** MPI_ERR_COUNT: invalid count argument
9: [sandy:2061949] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
9: [sandy:2061949] ***    and potentially your MPI job)
9: [sandy:2061930] 3 more processes have sent help message help-mpi-errors.txt / mpi_errors_are_fatal
9: [sandy:2061930] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
11/20 Test  #9: SpAsgn_Test ......................***Failed  155.20 sec

FBFS_Test:

test 15
      Start 15: FBFS_Test

15: Test command: /usr/bin/mpiexec "-n" "4" "/projects/mathlibs/build/combblas/obj-x86_64-linux-gnu/Applications/fbfs" "Gen" "16"
15: Working Directory: /projects/mathlibs/build/combblas/obj-x86_64-linux-gnu/Applications
15: Test timeout computed to be: 1500
15: Using synthetic data, which we ALWAYS permute for load balance
15: We only balance the original input, we don't repermute after each filter change
15: BFS is run on UNDIRECTED graph, hence hitting CCs, and TEPS is bidirectional
15: Forcing scale to : 16
15: graph_generation:               1.415538 s
15: Generated renamed edge lists
15: Converted to Boolean and removed 149 loops
15: As a whole: 65536 rows and 65536 columns and 909896 nonzeros
15: I/O (or generation) took 9.55391 seconds
15: As a whole: 65536 rows and 65536 columns and 909896 nonzeros
15: All degrees calculated
15: Load balance: 1.00815
15: [sandy:2062597] *** Process received signal ***
15: Symmetricized
15: --------------------------------------------------------------------------
15: Primary job  terminated normally, but 1 process returned
15: a non-zero exit code. Per user-direction, the job has been aborted.
15: --------------------------------------------------------------------------
15: --------------------------------------------------------------------------
15: mpiexec noticed that process rank 0 with PID 0 on node sandy exited on signal 11 (Segmentation fault).
15: --------------------------------------------------------------------------
 8/20 Test #15: FBFS_Test ........................***Failed   24.88 sec

FMIS_Test:

test 16
      Start 16: FMIS_Test

16: Test command: /usr/bin/mpiexec "-n" "4" "/projects/mathlibs/build/combblas/obj-x86_64-linux-gnu/Applications/fmis" "17"
16: Working Directory: /projects/mathlibs/build/combblas/obj-x86_64-linux-gnu/Applications
16: Test timeout computed to be: 1500
16: COMBBLAS Warning: It is dangerous to create (matrix) objects without specifying the communicator, are you sure you want to create this object in MPI_COMM_WORLD?
16: Using synthetic data, which we ALWAYS permute for load balance
16: We only balance the original input, we don't repermute after each filter change
16: BFS is run on UNDIRECTED graph, hence hitting CCs, and TEPS is bidirectional
16: Forcing scale to : 17
16: Generated renamed edge lists
16: graph_generation:               0.647811 s
16: Converted to Boolean and removed 75 loops
16: As a whole: 131072 rows and 131072 columns and 619978 nonzeros
16: Generation took 6.18405 seconds
16: As a whole: 131072 rows and 131072 columns and 619978 nonzeros
16: All degrees calculated
16: Load balance: 1.02317
16: Symmetricized
16: --------------------------------------------------------------------------
16: Primary job  terminated normally, but 1 process returned
16: a non-zero exit code. Per user-direction, the job has been aborted.
16: --------------------------------------------------------------------------
16: --------------------------------------------------------------------------
16: mpiexec noticed that process rank 0 with PID 0 on node sandy exited on signal 11 (Segmentation fault).
16: --------------------------------------------------------------------------
10/20 Test #16: FMIS_Test ........................***Failed   26.22 sec
drew-parsons commented 1 year ago

Actually these test failures do track the ones reported later in #15 . So the test failures there don't just affect FreeBSD.