NVlabs / NVBit

199 stars 18 forks source link

Possible bug which prevents mem_trace to print memory opcode in recv_thread_fun() #66

Open mahmoodn opened 2 years ago

mahmoodn commented 2 years ago

Hi While working with the default code in 1.5.3 version, I see that opcode_hist reports that the following kernel has memory instructions (STG.E) as below:

kernel 0 - kernel_info - #thread-blocks 1,  kernel instructions 28, total instructions 28
  EXIT = 1
  IMAD.MOV.U32 = 5
  MOV = 6
  STG.E = 15
  ULDC.64 = 1
Lattice spacing in x,y,z = 10.000000 10.000000 10.000000
Created orthogonal box = (0.0000000 0.0000000 -5.0000000) to (20000.000 300.00000 5.0000000)
  1 by 1 by 1 MPI processor grid

However, when I use mem_trace tool, it doesn't show the details of those memory instructions in recv_thread_fun().

MEMTRACE: CTX 0x0000560df95ced70 - LAUNCH - Kernel pc 0x00007faedadc0200 - Kernel name kernel_info - grid launch id 1 - grid size 1,1,1 - block size 1,1,1 - nregs 20 - shmem 0 - cuda stream id 94618018363472
Lattice spacing in x,y,z = 10.000000 10.000000 10.000000
Created orthogonal box = (0.0000000 0.0000000 -5.0000000) to (20000.000 300.00000 5.0000000)
  1 by 1 by 1 MPI processor grid

Don't know if nvbit has been tested with MPI, but I guess there is a problem which prevents nvbit to print such information. As I check the code, I am thinking about some possible cases: 1- Information is not pushed to the channel in the instrument function. I mean channel_dev->push(&ma, sizeof(mem_access_t));. OR 2- Information is not received properly because the received bytes is zero. I mean num_recv_bytes = ch_host->recv(recv_buffer, CHANNEL_SIZE);. OR 3- Maybe due to a bug in locks and races, ch_host->recv() is called before channel_dev->push().

Any thoughts on that?