During benchmarking this morning, I noticed that the bfs_demo crashes when run on multiple nodes in the simulator, but not in single node. I don't think this is related to #15.
Details
TLDR we're crashing in a lambda inside LGB::matrix_multiply<>.
Here's the crash report:
/tools/lucata/bin/emusim.x --total_nodes 2 -- ./build_lc_e23a5bf/src/benchmark/bfs_demo /net/bigtwin-d/data/graph500/graph500-scale10.mtx
SystemC 2.3.3-Accellera --- Apr 21 2023 11:46:50
Copyright (c) 1996-2018 by all Contributors,
ALL RIGHTS RESERVED
Selected mode is not availble on this architecture. Setting mode to GrB_BLOCKING.
[ERROR]: Failure in address translation: addr larger than total system bytes.
addr_in=0xe80002800002dc0, total_system_bytes=0x2000000000
EXCEPTION!
ThreadID=21168
HW ThreadID=0x195f30897173
Thread using HW ThreadID
ThreadletState=Service request
ThreadletException=5=Address
Exception cause string: Translation failure
ExecutionType=7
Current Instruction:
805351dc LDE: iToken=172 iLength=3 nibbles=b1d000
Threadlet TCB Data:
TCB.(TPC)=(0x805351dc) (32 bits each)
TCB.(D,D2)=(1,0) (one bit each)
TCB.A2=1
TCB.(TS,TSDATA)=(0,0x0) (two bits, four bits)
TCB.AID=0x1 (8 bits)
TCB.(NaN,U,V,CB,N,Z)=(0, 0, 0, 0, 0, 0)
TCB.M=0 (one bit)
Threadlet State Registers
TCB0: 0x000bffff64000200
TCB1: 0x00000000805351dc
Threadlet Data Registers
A: 0xe80002800002dc0=1044835285348658624
A2: 0x800000005b8870=36028797024962672
Format: signed decimal, unsigned decimal, hex
D: 108086466218939976, 108086466218939976, 0x18000118001d648
D2: 0, 0, 0x0
E[0] (Live): 108086466218819600, 108086466218819600, 0x180001180000010
E[1] (Live): 49, 49, 0x31
E[2] (Live): 49, 49, 0x31
E[3] (Live): 108086466218939584, 108086466218939584, 0x18000118001d4c0
E[4] (Live): 108086393204486360, 108086393204486360, 0x18000008001b0d8
E[5] (Live): 36028814198837368, 36028814198837368, 0x80000400001078
E[6] (Live): 108086393204430704, 108086393204430704, 0x18000008000d770
E[7] (Live): 36028814198833952, 36028814198833952, 0x80000400000320
E[8] (Live): 108086393204392768, 108086393204392768, 0x180000080004340
E[9] (Live): 2153494866, 2153494866, 0x805bb952
E[10] (Live): 2153494866, 2153494866, 0x805bb952
E[11] (Live): 2148281592, 2148281592, 0x800c2cf8
E[12] (Live): 1044835285348658624, 1044835285348658624, 0xe80002800002dc0
E[13] (Live): 108086466218940608, 108086466218940608, 0x18000118001d8c0
E[14] (Live): 108086466218939584, 108086466218939584, 0x18000118001d4c0
E[15] (Live): 108086466219063224, 108086466219063224, 0x18000118003b7b8
Other Useful Data
Fence Counter=0
Source Node=1
Dest Node=-1
Summary
During benchmarking this morning, I noticed that the
bfs_demo
crashes when run on multiple nodes in the simulator, but not in single node. I don't think this is related to #15.Details
TLDR we're crashing in a lambda inside
LGB::matrix_multiply<>
.Here's the crash report:
Manual debugging: