NVlabs / NVBit

200 stars 18 forks source link

comparing instruction mix of nvprof and nvbit #9

Open mahmoodn opened 4 years ago

mahmoodn commented 4 years ago

I see that the instruction mix of nvprof is different from nvbit. Some categories described here are not present in nvprof. However, I see big differences for others.

The program has one kernel which is invoked one time. The opcodes are

  ATOMS.ADD = 508920645
  BAR.SYNC = 123816
  BRA = 3817166581
  BSSY = 1037738104
  BSYNC = 1881710254
  EXIT = 1809
  FFMA = 1037672176
  FMUL = 518836088
  FSETP.GE.AND = 2593803797
  FSETP.GEU.AND = 178011288
  FSETP.GEU.OR = 340824800
  FSETP.LTU.OR = 518836088
  IADD3 = 3114835742
  IMAD = 1027773014
  IMAD.IADD = 3114829917
  IMAD.MOV.U32 = 1552036486
  IMAD.SHL.U32 = 3631492254
  IMAD.WIDE.U32 = 6322263
  IMAD.X = 201
  ISETP.GE.U32.AND = 525168200
  ISETP.GE.U32.OR = 11256
  ISETP.GT.U32.AND = 2593823093
  ISETP.GT.U32.OR = 45024
  ISETP.LE.U32.OR = 178011288
  ISETP.LT.U32.AND = 518836088
  ISETP.NE.AND = 12864
  LDC = 3631475973
  LDG.E.SYS = 6317439
  LDS.U = 1556621025
  LEA = 508922454
  LEA.HI.X = 201
  LOP3.LUT = 33768
  NOP = 123816
  S2R = 3216
  SEL = 5187673522
  SHF.R.U32.HI = 2593818269
  SHFL.IDX = 3114747909
  STG.E.64.SYS = 201
  STS = 268335

According to the reference, only

BAR.SYNC = 123816
NOP = 123816
S2R = 3216

Are considered as MISC. However in the picture below, nvprof says the MISC instructions are more than 158M.

Untitled

Also in other types I see big differences. Is that normal? Any reason for that?

ovilla commented 4 years ago

There are few assumptions you made that maybe true or not.

  1. The MISC category (as refereed in the the CUDA manual) may or may not be the same category nvprof is referring when categorizing "Misc Instructions" in your screenshot.

  2. The opcode-histogram tool in Nvbit is counting "warp level" instruction by default (it can do thread level count by setting the environment variable COUNT_WARP_LEVEL=0). However, I am not sure what the nvprof counters are reporting in you screenshot. Judging from the large discrepancy I would think that nvprof is counting thread level while the nvbit tool is counting warp level.

In general, we have extensively correlated nvbit tools with nvprof and we have never seen discrepancies of this magnitude.

mahmoodn commented 4 years ago

I ran with that variable set and still I see some differences for MISC. New stats are

  ATOMS.ADD = 15293598912
  BAR.SYNC = 3962112
  BRA = 118297388285
  BSSY = 33207619328
  BSYNC = 33207619328
  EXIT = 55476
  FFMA = 33205509632
  FMUL = 16602754816
  FSETP.GE.AND = 79150778604
  FSETP.GEU.AND = 5696361216
  FSETP.GEU.OR = 10906393600
  FSETP.LTU.OR = 16601755409
  IADD3 = 95823798432
  IMAD = 31896872308
  IMAD.IADD = 95823614444
  IMAD.MOV.U32 = 48673303412
  IMAD.SHL.U32 = 112355807409
  IMAD.WIDE.U32 = 202306989
  IMAD.X = 4020
  ISETP.GE.U32.AND = 16805382400
  ISETP.GE.U32.OR = 360192
  ISETP.GT.U32.AND = 79151396076
  ISETP.GT.U32.OR = 1440768
  ISETP.LE.U32.OR = 5695361809
  ISETP.LT.U32.AND = 16602754816
  ISETP.NE.AND = 411648
  LDC = 112355288829
  LDG.E.SYS = 202152621
  LDS.U = 49811870388
  LEA = 15293654388
  LEA.HI.X = 4020
  LOP3.LUT = 1080576
  NOP = 3962112
  S2R = 102912
  SEL = 158303666904
  SHF.R.U32.HI = 79151241708
  SHFL.IDX = 95820990188
  STG.E.64.SYS = 4020
  STS = 8581293

The MISC total number is 8,027,136 which is far less than nvprof:

BAR.SYNC = 3962112
NOP = 3962112
S2R = 102912

For INTEGER instructions, I see:

  IADD3 = 95823798432
  IMAD = 31896872308
  IMAD.IADD = 95823614444
  IMAD.MOV.U32 = 48673303412
  IMAD.SHL.U32 = 112355807409
  IMAD.WIDE.U32 = 202306989
  IMAD.X = 4020
  ISETP.GE.U32.AND = 16805382400
  ISETP.GE.U32.OR = 360192
  ISETP.GT.U32.AND = 79151396076
  ISETP.GT.U32.OR = 1440768
  ISETP.LE.U32.OR = 5695361809
  ISETP.LT.U32.AND = 16602754816
  ISETP.NE.AND = 411648
  LEA = 15293654388
  LEA.HI.X = 4020
  LOP3.LUT = 1080576
  SHF.R.U32.HI = 79151241708

The total number of integer instructions are 597,478,795,415.

As you can see, the integer is close to nvprof. Maybe the definition of MISC is different and I hope that the documents will be updated. I will also test other instructions.