Open mahmoodn opened 4 years ago
There are few assumptions you made that maybe true or not.
The MISC category (as refereed in the the CUDA manual) may or may not be the same category nvprof is referring when categorizing "Misc Instructions" in your screenshot.
The opcode-histogram tool in Nvbit is counting "warp level" instruction by default (it can do thread level count by setting the environment variable COUNT_WARP_LEVEL=0). However, I am not sure what the nvprof counters are reporting in you screenshot. Judging from the large discrepancy I would think that nvprof is counting thread level while the nvbit tool is counting warp level.
In general, we have extensively correlated nvbit tools with nvprof and we have never seen discrepancies of this magnitude.
I ran with that variable set and still I see some differences for MISC. New stats are
ATOMS.ADD = 15293598912
BAR.SYNC = 3962112
BRA = 118297388285
BSSY = 33207619328
BSYNC = 33207619328
EXIT = 55476
FFMA = 33205509632
FMUL = 16602754816
FSETP.GE.AND = 79150778604
FSETP.GEU.AND = 5696361216
FSETP.GEU.OR = 10906393600
FSETP.LTU.OR = 16601755409
IADD3 = 95823798432
IMAD = 31896872308
IMAD.IADD = 95823614444
IMAD.MOV.U32 = 48673303412
IMAD.SHL.U32 = 112355807409
IMAD.WIDE.U32 = 202306989
IMAD.X = 4020
ISETP.GE.U32.AND = 16805382400
ISETP.GE.U32.OR = 360192
ISETP.GT.U32.AND = 79151396076
ISETP.GT.U32.OR = 1440768
ISETP.LE.U32.OR = 5695361809
ISETP.LT.U32.AND = 16602754816
ISETP.NE.AND = 411648
LDC = 112355288829
LDG.E.SYS = 202152621
LDS.U = 49811870388
LEA = 15293654388
LEA.HI.X = 4020
LOP3.LUT = 1080576
NOP = 3962112
S2R = 102912
SEL = 158303666904
SHF.R.U32.HI = 79151241708
SHFL.IDX = 95820990188
STG.E.64.SYS = 4020
STS = 8581293
The MISC total number is 8,027,136 which is far less than nvprof:
BAR.SYNC = 3962112
NOP = 3962112
S2R = 102912
For INTEGER instructions, I see:
IADD3 = 95823798432
IMAD = 31896872308
IMAD.IADD = 95823614444
IMAD.MOV.U32 = 48673303412
IMAD.SHL.U32 = 112355807409
IMAD.WIDE.U32 = 202306989
IMAD.X = 4020
ISETP.GE.U32.AND = 16805382400
ISETP.GE.U32.OR = 360192
ISETP.GT.U32.AND = 79151396076
ISETP.GT.U32.OR = 1440768
ISETP.LE.U32.OR = 5695361809
ISETP.LT.U32.AND = 16602754816
ISETP.NE.AND = 411648
LEA = 15293654388
LEA.HI.X = 4020
LOP3.LUT = 1080576
SHF.R.U32.HI = 79151241708
The total number of integer instructions are 597,478,795,415.
As you can see, the integer is close to nvprof. Maybe the definition of MISC is different and I hope that the documents will be updated. I will also test other instructions.
I see that the instruction mix of nvprof is different from nvbit. Some categories described here are not present in nvprof. However, I see big differences for others.
The program has one kernel which is invoked one time. The opcodes are
According to the reference, only
Are considered as MISC. However in the picture below, nvprof says the MISC instructions are more than 158M.
Also in other types I see big differences. Is that normal? Any reason for that?