NVlabs / nvbitfi

Architecture-level Fault Injection Tool for GPU Application Resilience Evaluation
Other
53 stars 22 forks source link

Error when try to inject: "Something is not right. Total instruction count = 0" in matrixMul #11

Closed StefanoPisciotta closed 2 years ago

StefanoPisciotta commented 2 years ago

Hello everyone! I'm using NVBitFI trying to inject SINGLE_BIT_FLIP to G_GP using some Samples application that come with CUDA toolkit , but in some application like matrixMul, the tool doesn't profile the application. The command run.sh create an empy nvbitfi-igprofile.txt. The application stop itself when try to generate injection list due to the emptiness of nvbitfi-igprofile.txt, getting out with the above message. Does anyone know how to solve this problem?

Thanks!!

sergicuen commented 2 years ago

Hi, what is yor nvidia board? I´ve had the same problem when the matrix size is too large: above 256 in jetson nano (2GB) and above 1024 in JetsonTX2 (8GB). I think the problem is related to the memory of the board and the size of the data. Regards

StefanoPisciotta commented 2 years ago

Hi Sergio, thanks for the answer! i've a jetson nano (4GB) and the problem is the same like you said. Now with a smaller matrix it works.

Withing the sample I tried to change different parameters and the result change based on grid and block dimensions. When I set to 32 x 32 the block size, the limit for matrix dimension is 352 x 352 instead with 16 x 16 block size, the limit is 272 x 272. I hope this can help solve the problem .