Hi
While working with the default code in 1.5.3 version, I see that opcode_hist reports that the following kernel has memory instructions (STG.E) as below:
kernel 0 - kernel_info - #thread-blocks 1, kernel instructions 28, total instructions 28
EXIT = 1
IMAD.MOV.U32 = 5
MOV = 6
STG.E = 15
ULDC.64 = 1
Lattice spacing in x,y,z = 10.000000 10.000000 10.000000
Created orthogonal box = (0.0000000 0.0000000 -5.0000000) to (20000.000 300.00000 5.0000000)
1 by 1 by 1 MPI processor grid
However, when I use mem_trace tool, it doesn't show the details of those memory instructions in recv_thread_fun().
MEMTRACE: CTX 0x0000560df95ced70 - LAUNCH - Kernel pc 0x00007faedadc0200 - Kernel name kernel_info - grid launch id 1 - grid size 1,1,1 - block size 1,1,1 - nregs 20 - shmem 0 - cuda stream id 94618018363472
Lattice spacing in x,y,z = 10.000000 10.000000 10.000000
Created orthogonal box = (0.0000000 0.0000000 -5.0000000) to (20000.000 300.00000 5.0000000)
1 by 1 by 1 MPI processor grid
Don't know if nvbit has been tested with MPI, but I guess there is a problem which prevents nvbit to print such information. As I check the code, I am thinking about some possible cases:
1- Information is not pushed to the channel in the instrument function. I mean channel_dev->push(&ma, sizeof(mem_access_t));.
OR
2- Information is not received properly because the received bytes is zero. I mean num_recv_bytes = ch_host->recv(recv_buffer, CHANNEL_SIZE);.
OR
3- Maybe due to a bug in locks and races, ch_host->recv() is called before channel_dev->push().
Hi While working with the default code in 1.5.3 version, I see that
opcode_hist
reports that the following kernel has memory instructions (STG.E) as below:However, when I use
mem_trace
tool, it doesn't show the details of those memory instructions inrecv_thread_fun()
.Don't know if nvbit has been tested with MPI, but I guess there is a problem which prevents nvbit to print such information. As I check the code, I am thinking about some possible cases: 1- Information is not pushed to the channel in the instrument function. I mean
channel_dev->push(&ma, sizeof(mem_access_t));
. OR 2- Information is not received properly because the received bytes is zero. I meannum_recv_bytes = ch_host->recv(recv_buffer, CHANNEL_SIZE);
. OR 3- Maybe due to a bug in locks and races,ch_host->recv()
is called beforechannel_dev->push()
.Any thoughts on that?