Closed luliyucoordinate closed 3 months ago
Not exactly the same, you can try printing the swizzled layout of a 16x16 matrix:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
17 16 19 18 21 20 23 22 25 24 27 26 29 28 31 30
34 35 32 33 38 39 36 37 42 43 40 41 46 47 44 45
51 50 49 48 55 54 53 52 59 58 57 56 63 62 61 60
68 69 70 71 64 65 66 67 76 77 78 79 72 73 74 75
85 84 87 86 81 80 83 82 93 92 95 94 89 88 91 90
102 103 100 101 98 99 96 97 110 111 108 109 106 107 104 105
119 118 117 116 115 114 113 112 127 126 125 124 123 122 121 120
128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143
145 144 147 146 149 148 151 150 153 152 155 154 157 156 159 158
162 163 160 161 166 167 164 165 170 171 168 169 174 175 172 173
179 178 177 176 183 182 181 180 187 186 185 184 191 190 189 188
196 197 198 199 192 193 194 195 204 205 206 207 200 201 202 203
213 212 215 214 209 208 211 210 221 220 223 222 217 216 219 218
230 231 228 229 226 227 224 225 238 239 236 237 234 235 232 233
247 246 245 244 243 242 241 240 255 254 253 252 251 250 249 248
The delta between f(i, j) with f(i, j+4) is 4 or -4, not always 4.
thx
https://github.com/flashinfer-ai/flashinfer/blob/main/include/flashinfer/permuted_smem.cuh#L70