Open jamesETsmith opened 1 year ago
Do I remember correctly that Connected Components requires a user-defined type? Could that be part of the failure?
Good memory, it does require a user-defined select op https://github.com/emusolutions/LAGraph/blob/ab0d521d1a30746f75014470bc4517a6d60d920e/src/algorithm/LG_CC_Boruvka.c#L72-L77
I'll leave this issue here for future reference until we implement user-defined select ops
Interestingly, I compiled the non-vanilla version of LAGr_ConnectedComponents
which uses LG_CC_FastSV6
since it doesn't rely on user-defined operations. However, the tests crashes almost immediately and I think it's because we don't implement certain variants of GxB_Matrix_unpack_CSC
. This might be worth looking into because I think it would be easier to implement the GxB_Matrix_unpack_CSC
variants than user-defined operations and would yield a faster method anyway.
Here's a record of the failures and the eventual crash when running this test with LG_CC_FastSV6
as the backend method:
ctest --test-dir build_lc2 -R ConnectedComp -V
Internal ctest changing into directory: /net/hyper120h-d/data/jsmith/apps/LAGraph/build_lc2
UpdateCTestConfiguration from :/net/hyper120h-d/data/jsmith/apps/LAGraph/build_lc2/DartConfiguration.tcl
UpdateCTestConfiguration from :/net/hyper120h-d/data/jsmith/apps/LAGraph/build_lc2/DartConfiguration.tcl
Test project /net/hyper120h-d/data/jsmith/apps/LAGraph/build_lc2
Constructing a list of tests
Done constructing a list of tests
Updating test list for fixtures
Added 0 tests to meet fixture requirements
Checking test dependency graph...
Checking test dependency graph end
test 8
Start 8: ctest_ConnectedComponents
8: Test command: /usr/bin/cmake "-E" "env" "/tools/lucata/bin/emusim.x" "--forward_return_value" "--" "test_ConnectedComponents" "--no-exec"
8: Test timeout computed to be: 10000000
8:
8: SystemC 2.3.3-Accellera --- Apr 21 2023 11:46:50
8: Copyright (c) 1996-2018 by all Contributors,
8: ALL RIGHTS RESERVED
8: Test cc... [ERROR]: Failure in address translation: shared bit wasn't set.
8: addr_in=0x20, addr=0x20
8: EXCEPTION!
8: ThreadID=0
8: HW ThreadID=0x1
8: Thread using HW ThreadID
8: ThreadletState=Service request
8: ThreadletException=5=Address
8: Exception cause string: Translation failure
8: ExecutionType=7
8: Current Instruction:
8: 801072a2 LDE: iToken=172 iLength=3 nibbles=b7d000
8: Threadlet TCB Data:
8: TCB.(TPC)=(0x801072a2) (32 bits each)
8: TCB.(D,D2)=(1,1) (one bit each)
8: TCB.A2=1
8: TCB.(TS,TSDATA)=(0,0x0) (two bits, four bits)
8: TCB.AID=0x1 (8 bits)
8: TCB.(NaN,U,V,CB,N,Z)=(0, 0, 0, 0, 0, 0)
8: TCB.M=0 (one bit)
8:
8: Threadlet State Registers
8: TCB0: 0x000cffff74000200
8: TCB1: 0x00000000801072a2
8:
8: Threadlet Data Registers
8: A: 0x20=32
8: A2: 0x1800000800041b0=108086393204392368
8: Format: signed decimal, unsigned decimal, hex
8: D: 720, 720, 0x2d0
8: D2: 108086401794340760, 108086401794340760, 0x180000280007798
8: E[0] (Live): 108086393204395088, 108086393204395088, 0x180000080004c50
8: E[1] (Live): 0, 0, 0x0
8: E[2] (Live): -3, 18446744073709551613, 0xfffffffffffffffd
8: E[3] (Live): 108086393204403584, 108086393204403584, 0x180000080006d80
8: E[4] (Live): 1, 1, 0x1
8: E[5] (Live): 0, 0, 0x0
8: E[6] (Live): 0, 0, 0x0
8: E[7] (Live): 1, 1, 0x1
8: E[8] (Live): 108086393204498080, 108086393204498080, 0x18000008001dea0
8: E[9] (Live): 0, 0, 0x0
8: E[10] (Live): 0, 0, 0x0
8: E[11] (Live): 108086393204392368, 108086393204392368, 0x1800000800041b0
8: E[12] (Live): 36028814198833824, 36028814198833824, 0x800004000002a0
8: E[13] (Live): 108086393204395632, 108086393204395632, 0x180000080004e70
8: E[14] (Live): 108086393204395664, 108086393204395664, 0x180000080004e90
8: E[15] (Live): 108086393204395648, 108086393204395648, 0x180000080004e80
8:
8: Other Useful Data
8: Fence Counter=0
8: Source Node=0
8: Dest Node=-1
1/1 Test #8: ctest_ConnectedComponents ........***Failed 2.10 sec
0% tests passed, 1 tests failed out of 1
Total Test time (real) = 2.10 sec
The following tests FAILED:
8 - ctest_ConnectedComponents (Failed)
Errors while running CTest
Output from these tests are in: /net/hyper120h-d/data/jsmith/apps/LAGraph/build_lc2/Testing/Temporary/LastTest.log
Use "--rerun-failed --output-on-failure" to re-run the failed cases verbosely
Here's the followup "debugging" analysis:
❯ /tools/lucata/bin/gossamer64-objdump -xD build_lc2/src/test/test_ConnectedComponents > cc.objdump
❯ grep 801072a2 cc.objdump
40083951: 801072a2: LDE 7
❯ gdb -q build_lc/src/test/test_ConnectedComponents
Reading symbols from build_lc/src/test/test_ConnectedComponents...(no debugging symbols found)...done.
(gdb) x/i 0x40083951
0x40083951 <@GrB_Matrix_assign_INT32+137>: push %ds
As a correction, the non-vanilla version only compiles because the actual code for the non-vanilla implementation was inside #ifdef LAGRAPH_SUITESPARSE
statements so it wasn't actually getting compiled. Once I remove those I run into a compilation error because we're missing GxB_MIN_SECONDI_INT64
. We have GxB_MIN_SECOND_INT64
already, I'm going to scope out the difference between the two semirings, if it's easy to add I'll do that to see how much farther we get through compilation.
General
This is a spinoff of https://github.com/emusolutions/LAGraph/issues/4 to tackle the problems with
ctest_ConnectedComponents
.Details