Closed rodhuega closed 2 years ago
Unless some bug in the tool, the only thing I can think of is that the "SHFL.IDX 4 R255 R255 R255 R255 0" is actually "@!PT SHFL.IDX 4 R255 R255 R255 R255 0", so in reality the instruction itself is encoded to be always predicated as false (and somehow there is a bug in nvbit when we return the string text of it). You could check that with "cuobjdump -sass" on your application to confirm it. I will try to take a closer look next time I get around that code, but don't know yet when. Thanks for pointing this out.
I don't have access to a V100 that is the GPU that has generated that trace. Do you know if "cuobjdump -sass" would be similar with a GTX1080TI or RTX2080TI? These cards are the one that I have access.
Hi, I have compiled to volta even that I don't have a volta card and then checked the cuobjdump as you said. I can't identify any instruction similar to that SHFL. I don't know if I did something wrong. I searched in both kernels (the one that this happens is the _Z22bpnn_layerforward_CUDAPfS_S_S_ii kernel). I attach here two files. The trace generated for nvbit and downloaded from their repo and the cuobjdump generated as you said. I hope that this two files will be helpful. cuobjdumpOut.txt kernel1backproptrace.txt
Thanks for the added information, we will take a detailed look when possible (not sure when yet).
Just checked the sass code of that kernel and confirm that the instruction is @!PT SHFL.IDX 4 R255 R255 R255 R255 0
, so the instruction is always predicated off. Close the issue. Please reopen it if you find something different.
Hi, I'm a user of Nvbit with accel-sim. I have observed a behavior that I don't understand why happens. Here is the kernel code(first kernel of backprop Rodinia2):
Below I paste a small piece of the trace code generated by Nvbit:
My question is how it is possible that at PC 0010 (16) of the trace, there is a instruction with the mask of threads with full 0s? Before this instruction there is only an
IMAD
and there isn't any branch or something like this.PD: It might be useful for you the line where the accelsim tracer prints the mask https://github.com/accel-sim/accel-sim-framework/blob/4c2bf09a79d6b57bb10fe1898700930a5dd5531f/util/tracer_nvbit/tracer_tool/tracer_tool.cu#L529