Error not injected when threads/block different to 1024

sergicuen commented 3 years ago

Hi all! similar to a previous issue I am having problems when injenct faults in a very simple kernel of matrix mult and the number of threads/block is different to 32x32 (1024). Any other value (e.g.: 16x16) produces some "Error not injected" results.

kernelName=matrixMulCUDA(float,float,float,int,int,int) kernelCount=0 groupID=7 bitFlipModel=0 instID=21059427 opIDSeed=0.401131 bitIDSeed=0.326217 inspecting: matrixMulCUDA(float,float,float,int,int,int) num_static_instrs: 282 maxregs: 32(32) Injection data index: 0 kernel_name: matrixMulCUDA(float,float,float*,int,int,int) ctas: 64 instrs: 14057472 grp 0: 0 grp 1: 2097152 grp 2: 4194304 grp 3: 262144 grp 4: 1130496 grp 5: 6373376 grp 6: 12926976 grp 7: 12664832 mask: 0x0 beforeVal: 0x0;afterVal: 0x0 regNo: -1 opcode: NOP pcOffset: 0x0 tid: -1 Error not injected

All the versions compile and pass the test (works correctly). I also tried to inject faults with DUMMY flag with the same results (some dummy injections work others don´t). In all the cases I ´ve rerun the profiler to be sure all is ok. I´ve activated the VERBOSE_TOOLS flags but the info is difficult to interpret since some numbers haven´t got any identifier. I´ve checked with Jetson nano and TX2 boards with the same result and using different matrix sizes.

The kernel is very simple: global void matrixMulCUDA(float C, float A, float B, int ldA, int ldB, int ldC) { int i = blockIdx.y blockDim.y + threadIdx.y; int j = blockIdx.x blockDim.x + threadIdx.x; float ptrA = &A[ildA]; // Pointer to the first element of row i of A float tmp = 0.0f; for (int k = 0; k < ldA; k++) { tmp += (ptrA++) B[kldB+j]; } C[i*ldC+j] = tmp; }

main() { .... int block_size = 16; dim3 dimsA(128, 128, 1); dim3 dimsB(128, 128, 1); dim3 dimsC(128, 128, 1); dim3 threads(block_size, block_size); dim3 grid(dimsC.x / threads.x, dimsC.y / threads.y); .... matrixMulCUDA<<< grid, threads >>>(d_C, d_A, d_B, dimsA.x, dimsB.x, dimsC.x); .... }

Please could you give me some hints for debugging the problem?

Thank you in advance.

dsartzet commented 2 years ago

Same issue here. Tried with hotspot and srad_v2 from Rodinia benchmark suite and I get plenty of "Error not injected" results. The simple_add which is included in nvbitfi works without issues. Any ideas? Thanks.

Update It seems that the optimization in profiler/inject_func.cu is causing the issue (at least in my Tesla K20c gpu). By letting every thread to use the atomicAdd then the instruction group counters are computed correctly and no more "Error not injected" results. So as a workaround just replace the code of the count_instrs function with this

uint64_t *counters = (uint64_t*)pcounters;
atomicAdd((unsigned long long *)&counters[index], 1);
atomicAdd((unsigned long long *)&counters[NUM_ISA_INSTRUCTIONS+grp_index], 1);
atomicAdd((unsigned long long *)&counters[num_counters-2], grp_index != G_NODEST);
atomicAdd((unsigned long long *)&counters[num_counters-1], 1 - ((grp_index == G_NODEST) || (grp_index == G_PR)));

sergicuen commented 2 years ago

Thanks Dimitris, the preliminary test shows that the workaround works fine with the matrixmult in Jetson nano (Maxwell gpu). I leave the issue open until futher test are performed.

Update workaround tested sucessfully with several benchmarks.

NVlabs / nvbitfi

Error not injected when threads/block different to 1024 #7