NOP instructions in Matrix Multiplication

albertorodes commented 3 years ago

Hello!

I am trying to run a series of tests to compare the reliability of different versions of the Matrix Multiplication. The kernels that I am using have a parameter that allows to change the thread block size. I performed tests with this parameter set to 32x32 and had no problems or unexpected results. However, when I tried to change that parameter to 16x16 or 8x8 I started getting these types of results:

inspecting: voidmatrixMulCUDA<8>(float,float,float,int,int) num_static_instrs: 90 maxregs: 30(30) Injection data index: 0 kernel_name: voidmatrixMulCUDA<8>(float,float,float,int,int) ctas: 256 instrs: 10452992 grp 0: 0 grp 1: 2097152 grp 2: 3145728 grp 3: 278528 grp 4: 1671168 grp 5: 3260416 grp 6: 8781824 grp 7: 8503296 mask: 0x0 beforeVal: 0x0;afterVal: 0x0 regNo: -1 opcode: NOP pcOffset: 0x0 tid: -1 Error not injected

I checked the injection file in the logs and found lines like this one in all the injections that failed: 1;voidmatrixMulCUDA<8>(float,float,float*,int,int);0;28898422;0.947758577437;0.204871567272:0x0:NOP: -1:0x0:15.610934:19::value_before0x0:value_after0x0

As I said, these injections on NOP instructions never happened with the 32x32 thread block size, but it happens almost 80% of the time with other values.

Thank you in advance!

sivahari commented 3 years ago

This output is typically printed when nvbitfi could not find the injection site. One possibility is that the profiling run thinks that there are way more instructions than the actual injection run. Did you rerun the profiler when you changed the input?

On Tue, Jun 1, 2021 at 2:10 PM aarg3 @.***> wrote:

Hello!

I am trying to run a series of tests to compare the reliability of different versions of the Matrix Multiplication. The kernels that I am using have a parameter that allows to change the thread block size. I performed tests with this parameter set to 32x32 and had no problems or unexpected results. However, when I tried to change that parameter to 16x16 or 8x8 I started getting these types of results:

inspecting: voidmatrixMulCUDA<8>(float,float,float,int,int) num_static_instrs: 90 maxregs: 30(30) Injection data index: 0 kernel_name: voidmatrixMulCUDA<8>(float,float,float,int,int) ctas: 256 instrs: 10452992 grp 0: 0 grp 1: 2097152 grp 2: 3145728 grp 3: 278528 grp 4: 1671168 grp 5: 3260416 grp 6: 8781824 grp 7: 8503296 mask: 0x0 beforeVal: 0x0;afterVal: 0x0 regNo: -1 opcode: NOP pcOffset: 0x0 tid: -1 Error not injected

I checked the injection file in the logs and found lines like this one in all the injections that failed:

1;voidmatrixMulCUDA<8>(float,float,float*,int,int);0;28898422;0.947758577437;0.204871567272:0x0:NOP 👎0x0:15.610934:19::value_before0x0:value_after0x0

As I said, these injections on NOP instructions never happened with the 32x32 thread block size, but it happens almost 80% of the time with other values.

Thank you in advance!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/NVlabs/nvbitfi/issues/5, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABRMQ5Y2JXLR2WXCBMDLOILTQVEC7ANCNFSM455M7K6Q .

albertorodes commented 3 years ago

I checked and yes, the profiler is running and generating different results depending on the parameters. However, it is true that the block size values that generate this errors have a much higher instruction count that the ones that don't generate any. To be specific the profiler with a 32x32 block size counts 263168 instructions (doesn't generate any problems) and with a 8x8 block size it counts 1052672 (generates 80% of "not injected errors"). It could be something about the implementation, but the instruction count difference seems too large.

sergicuen commented 2 years ago

The issue was solve using the wordaround described here: Error not injected when threads/block different to 1024 #7

NVlabs / nvbitfi

NOP instructions in Matrix Multiplication #5