flashinfer-ai / flashinfer

FlashInfer: Kernel Library for LLM Serving
https://flashinfer.ai
Apache License 2.0
1.21k stars 111 forks source link

Why did we perform an operation similar to data alignment here instead of directly adding 4? #332

Closed luliyucoordinate closed 3 months ago

luliyucoordinate commented 3 months ago

https://github.com/flashinfer-ai/flashinfer/blob/main/include/flashinfer/permuted_smem.cuh#L70

yzh119 commented 3 months ago

Not exactly the same, you can try printing the swizzled layout of a 16x16 matrix:

0       1       2       3       4       5       6       7       8       9       10      11      12      13      14      15
17      16      19      18      21      20      23      22      25      24      27      26      29      28      31      30
34      35      32      33      38      39      36      37      42      43      40      41      46      47      44      45
51      50      49      48      55      54      53      52      59      58      57      56      63      62      61      60
68      69      70      71      64      65      66      67      76      77      78      79      72      73      74      75
85      84      87      86      81      80      83      82      93      92      95      94      89      88      91      90
102     103     100     101     98      99      96      97      110     111     108     109     106     107     104     105
119     118     117     116     115     114     113     112     127     126     125     124     123     122     121     120
128     129     130     131     132     133     134     135     136     137     138     139     140     141     142     143
145     144     147     146     149     148     151     150     153     152     155     154     157     156     159     158
162     163     160     161     166     167     164     165     170     171     168     169     174     175     172     173
179     178     177     176     183     182     181     180     187     186     185     184     191     190     189     188
196     197     198     199     192     193     194     195     204     205     206     207     200     201     202     203
213     212     215     214     209     208     211     210     221     220     223     222     217     216     219     218
230     231     228     229     226     227     224     225     238     239     236     237     234     235     232     233
247     246     245     244     243     242     241     240     255     254     253     252     251     250     249     248

The delta between f(i, j) with f(i, j+4) is 4 or -4, not always 4.

luliyucoordinate commented 3 months ago

thx