Jerc007 / Open-GPGPU-FlexGrip-

FlexGripPlus: an open-source GPU model for reliability evaluation and micro architectural simulation
MIT License
85 stars 18 forks source link

how to understand the following asm #9

Closed xiaoyu1004 closed 4 months ago

xiaoyu1004 commented 7 months ago

code for sm_10 Function : _Z9TEST_progPiSS .headerflags @"EF_CUDA_SM10 EF_CUDA_PTX_SM(EF_CUDA_SM10)" /0000/ MOV.U16 R0H, g [0x1].U16; / 0x0023c78010004205 / /0008/ I2I.U32.U16 R1, R0L; / 0x04000780a0000005 / /0010/ IMAD.U16 R0, g [0x6].U16, R0H, R1; / 0x0020478060014c01 / /0018/ SHL R2, R0, 0x2; / 0xc410078030020009 / /0020/ IADD32 R0, g [0x4], R2; / 0x2102e800 / /0024/ IADD32 R3, g [0x6], R2; / 0x2102ec0c / /0028/ GLD.U32 R1, global14[R0]; / 0x80c00780d00e0005 / /0030/ GLD.U32 R0, global14[R3]; / 0x80c00780d00e0601 / /0038/ IADD32 R1, R1, R0; / 0x20008204 / /003c/ IADD32 R0, g [0x8], R2; / 0x2102f000 / /0040/ GST.U32 global14[R0], R1; / 0xa0c00781d00e0005 / ...................................

Does RX[H/L] divide each register into high 16 bits and low 16 bits for use? I guess R0L in the second instruction stores threadIdx.x, but R0L has never been assigned before reading. Is this a hardware requirement, threadIdx.x is always placed in R0L? But where does blockIdx.x get it from? In addition, g [0x6] is used in both the third and sixth instructions. In the third instruction, I guess it is blockDim.x, but in the sixth instruction, I think it is the pointer in the kernel parameter. I'm so confused. Can you give me some advice? Thank you very much

divadnauj-GB commented 4 months ago

Yes, those are the registers used to calculate the tread indexes. Indeed the threadIdx.x, blockIdx.x, etc are configured into the shared memory before the kernel is executed. I think this documentation will help you to better understand how such instructions work. https://zenodo.org/records/3819313 https://webthesis.biblio.polito.it/secure/9031/1/tesi.pdf