dougallj / applegpu

Apple G13 GPU architecture docs and tools
BSD 3-Clause "New" or "Revised" License
552 stars 41 forks source link

Decode image write block instruction #41

Closed alyssarosenzweig closed 1 year ago

alyssarosenzweig commented 1 year ago

This instruction ("TODO.unkB1") is used to write out an entire block from local memory into an image. Because it is block based and not pixel based, in comparison to the regular image write instruction it works even if the destination image is compressed. It is tailor fit for use in the end-of-tile program, to blit tile memory to the framebuffer.

TellowKrinkle commented 1 year ago

Texture extension is at 62 0001-imageblock-write-Texture-extension.patch

Test shader ```metal using namespace metal; struct Test { float4 yay; }; struct ArgBuf { texture2d tex[64]; }; kernel void test(metal::imageblock f, constant ArgBuf& texlist, ushort2 lid [[thread_position_in_threadgroup]], ushort2 gid [[thread_position_in_grid]]) { threadgroup_imageblock Test* t = f.data(lid); t->yay = float4(0, 1, 2, 3); threadgroup_barrier(mem_flags::mem_threadgroup_imageblock); if (all(lid == 0)) { imageblock_slice slice = f.slice(t->yay); for (int i = 0; i < 64; i++) { texlist.tex[i].write(slice, gid); } } } ``` ``` compute shader prolog: 0: 0541040d00c73200 device_load 0, i32, xy, r104_r105, u2_u3, 0, signed, lsl 1 8: 05a1144d00c43200 device_load 1, i32, xy, r20_r21, u2_u3, 1, signed, lsl 1 10: 0591144d00c83200 device_load 1, i32, xy, r18_r19, u2_u3, 1, signed, lsl 2 18: 0581344d00c43200 device_load 1, i32, xy, r16_r17, u2_u3, 3, signed, lsl 1 20: 0571244d00c83200 device_load 1, i32, xy, r14_r15, u2_u3, 2, signed, lsl 2 28: 0561544d00c43200 device_load 1, i32, xy, r12_r13, u2_u3, 5, signed, lsl 1 30: 0551344d00c83200 device_load 1, i32, xy, r10_r11, u2_u3, 3, signed, lsl 2 38: 0541744d00c43200 device_load 1, i32, xy, r8_r9, u2_u3, 7, signed, lsl 1 40: 0531444d00c83200 device_load 1, i32, xy, r6_r7, u2_u3, 4, signed, lsl 2 48: 0521940d00c43200 device_load 0, i32, xy, r4_r5, u2_u3, 9, signed, lsl 1 50: 0511540d00c83200 device_load 0, i32, xy, r2_r3, u2_u3, 5, signed, lsl 2 58: 0501b40d00c43200 device_load 0, i32, xy, r0_r1, u2_u3, 11, signed, lsl 1 60: 0531640d00cb3200 device_load 0, i32, xy, r102_r103, u2_u3, 6, signed, lsl 2 68: 0521d40d00c73200 device_load 0, i32, xy, r100_r101, u2_u3, 13, signed, lsl 1 70: 0511740d00cb3200 device_load 0, i32, xy, r98_r99, u2_u3, 7, signed, lsl 2 78: 0501f40d00c73200 device_load 0, i32, xy, r96_r97, u2_u3, 15, signed, lsl 1 80: 3800 wait 0 82: 05f1840d00ca3200 device_load 0, i32, xy, r94_r95, u2_u3, 8, signed, lsl 2 8a: 05e1140d01c63200 device_load 0, i32, xy, r92_r93, u2_u3, 17, signed, lsl 1 92: 05d1940d00ca3200 device_load 0, i32, xy, r90_r91, u2_u3, 9, signed, lsl 2 9a: 05c1340d01c63200 device_load 0, i32, xy, r88_r89, u2_u3, 19, signed, lsl 1 a2: 05b1a40d00ca3200 device_load 0, i32, xy, r86_r87, u2_u3, 10, signed, lsl 2 aa: 05a1540d01c63200 device_load 0, i32, xy, r84_r85, u2_u3, 21, signed, lsl 1 b2: 0591b40d00ca3200 device_load 0, i32, xy, r82_r83, u2_u3, 11, signed, lsl 2 ba: 0581740d01c63200 device_load 0, i32, xy, r80_r81, u2_u3, 23, signed, lsl 1 c2: 3801 wait 1 c4: 0571c44d00ca3200 device_load 1, i32, xy, r78_r79, u2_u3, 12, signed, lsl 2 cc: 0561944d01c63200 device_load 1, i32, xy, r76_r77, u2_u3, 25, signed, lsl 1 d4: 0551d44d00ca3200 device_load 1, i32, xy, r74_r75, u2_u3, 13, signed, lsl 2 dc: 0541b44d01c63200 device_load 1, i32, xy, r72_r73, u2_u3, 27, signed, lsl 1 e4: 0531e44d00ca3200 device_load 1, i32, xy, r70_r71, u2_u3, 14, signed, lsl 2 ec: 0521d44d01c63200 device_load 1, i32, xy, r68_r69, u2_u3, 29, signed, lsl 1 f4: 0511f44d00ca3200 device_load 1, i32, xy, r66_r67, u2_u3, 15, signed, lsl 2 fc: 0501f44d01c63200 device_load 1, i32, xy, r64_r65, u2_u3, 31, signed, lsl 1 104: 3800 wait 0 106: 05f1040d01c93200 device_load 0, i32, xy, r62_r63, u2_u3, 16, signed, lsl 2 10e: 05e1140d02c53200 device_load 0, i32, xy, r60_r61, u2_u3, 33, signed, lsl 1 116: 05d1140d01c93200 device_load 0, i32, xy, r58_r59, u2_u3, 17, signed, lsl 2 11e: 05c1340d02c53200 device_load 0, i32, xy, r56_r57, u2_u3, 35, signed, lsl 1 126: 05b1240d01c93200 device_load 0, i32, xy, r54_r55, u2_u3, 18, signed, lsl 2 12e: 05a1540d02c53200 device_load 0, i32, xy, r52_r53, u2_u3, 37, signed, lsl 1 136: 0591340d01c93200 device_load 0, i32, xy, r50_r51, u2_u3, 19, signed, lsl 2 13e: 0581740d02c53200 device_load 0, i32, xy, r48_r49, u2_u3, 39, signed, lsl 1 146: 3801 wait 1 148: 0571444d01c93200 device_load 1, i32, xy, r46_r47, u2_u3, 20, signed, lsl 2 150: 0561944d02c53200 device_load 1, i32, xy, r44_r45, u2_u3, 41, signed, lsl 1 158: 0551544d01c93200 device_load 1, i32, xy, r42_r43, u2_u3, 21, signed, lsl 2 160: 0541b44d02c53200 device_load 1, i32, xy, r40_r41, u2_u3, 43, signed, lsl 1 168: 0531644d01c93200 device_load 1, i32, xy, r38_r39, u2_u3, 22, signed, lsl 2 170: 0521d44d02c53200 device_load 1, i32, xy, r36_r37, u2_u3, 45, signed, lsl 1 178: 0511744d01c93200 device_load 1, i32, xy, r34_r35, u2_u3, 23, signed, lsl 2 180: 0501f44d02c53200 device_load 1, i32, xy, r32_r33, u2_u3, 47, signed, lsl 1 188: 3800 wait 0 18a: 05f1840d01c83200 device_load 0, i32, xy, r30_r31, u2_u3, 24, signed, lsl 2 192: 05e1140d03c43200 device_load 0, i32, xy, r28_r29, u2_u3, 49, signed, lsl 1 19a: 05d1940d01c83200 device_load 0, i32, xy, r26_r27, u2_u3, 25, signed, lsl 2 1a2: 05c1340d03c43200 device_load 0, i32, xy, r24_r25, u2_u3, 51, signed, lsl 1 1aa: 05b1a40d01c83200 device_load 0, i32, xy, r22_r23, u2_u3, 26, signed, lsl 2 1b2: c540803d00833000 uniform_store 2, i16, xy, 0, r104l_r104h, 8 1ba: 62a1000000000030 mov_imm r104, 0 1c2: c540a03d00833000 uniform_store 2, i16, xy, 0, r104l_r104h, 10 1ca: c5a0c03d00803000 uniform_store 2, i16, xy, 0, r20l_r20h, 12 1d2: 05a1540d03c43200 device_load 0, i32, xy, r20_r21, u2_u3, 53, signed, lsl 1 1da: c540e03d00833000 uniform_store 2, i16, xy, 0, r104l_r104h, 14 1e2: c590003d01803000 uniform_store 2, i16, xy, 0, r18l_r18h, 16 1ea: 0591b40d01c83200 device_load 0, i32, xy, r18_r19, u2_u3, 27, signed, lsl 2 1f2: c540203d01833000 uniform_store 2, i16, xy, 0, r104l_r104h, 18 1fa: c580403d01803000 uniform_store 2, i16, xy, 0, r16l_r16h, 20 202: 0581740d03c43200 device_load 0, i32, xy, r16_r17, u2_u3, 55, signed, lsl 1 20a: c540603d01833000 uniform_store 2, i16, xy, 0, r104l_r104h, 22 212: c570803d01803000 uniform_store 2, i16, xy, 0, r14l_r14h, 24 21a: 3801 wait 1 21c: 0571c44d01c83200 device_load 1, i32, xy, r14_r15, u2_u3, 28, signed, lsl 2 224: c540a03d01833000 uniform_store 2, i16, xy, 0, r104l_r104h, 26 22c: c560c03d01803000 uniform_store 2, i16, xy, 0, r12l_r12h, 28 234: 0561944d03c43200 device_load 1, i32, xy, r12_r13, u2_u3, 57, signed, lsl 1 23c: c540e03d01833000 uniform_store 2, i16, xy, 0, r104l_r104h, 30 244: c550003d02803000 uniform_store 2, i16, xy, 0, r10l_r10h, 32 24c: 0551d44d01c83200 device_load 1, i32, xy, r10_r11, u2_u3, 29, signed, lsl 2 254: c540203d02833000 uniform_store 2, i16, xy, 0, r104l_r104h, 34 25c: c540403d02803000 uniform_store 2, i16, xy, 0, r8l_r8h, 36 264: 0541b44d03c43200 device_load 1, i32, xy, r8_r9, u2_u3, 59, signed, lsl 1 26c: c540603d02833000 uniform_store 2, i16, xy, 0, r104l_r104h, 38 274: c530803d02803000 uniform_store 2, i16, xy, 0, r6l_r6h, 40 27c: 0531e44d01c83200 device_load 1, i32, xy, r6_r7, u2_u3, 30, signed, lsl 2 284: c540a03d02833000 uniform_store 2, i16, xy, 0, r104l_r104h, 42 28c: c520c03d02803000 uniform_store 2, i16, xy, 0, r4l_r4h, 44 294: 0521d44d03c43200 device_load 1, i32, xy, r4_r5, u2_u3, 61, signed, lsl 1 29c: c540e03d02833000 uniform_store 2, i16, xy, 0, r104l_r104h, 46 2a4: c510003d03803000 uniform_store 2, i16, xy, 0, r2l_r2h, 48 2ac: 0511f44d01c83200 device_load 1, i32, xy, r2_r3, u2_u3, 31, signed, lsl 2 2b4: c540203d03833000 uniform_store 2, i16, xy, 0, r104l_r104h, 50 2bc: c500403d03803000 uniform_store 2, i16, xy, 0, r0l_r0h, 52 2c4: 0501f44d03c43200 device_load 1, i32, xy, r0_r1, u2_u3, 63, signed, lsl 1 2cc: c540603d03833000 uniform_store 2, i16, xy, 0, r104l_r104h, 54 2d4: c530803d03833000 uniform_store 2, i16, xy, 0, r102l_r102h, 56 2dc: c540a03d03833000 uniform_store 2, i16, xy, 0, r104l_r104h, 58 2e4: c520c03d03833000 uniform_store 2, i16, xy, 0, r100l_r100h, 60 2ec: c540e03d03833000 uniform_store 2, i16, xy, 0, r104l_r104h, 62 2f4: c510003d04833000 uniform_store 2, i16, xy, 0, r98l_r98h, 64 2fc: c540203d04833000 uniform_store 2, i16, xy, 0, r104l_r104h, 66 304: c500403d04833000 uniform_store 2, i16, xy, 0, r96l_r96h, 68 30c: c540603d04833000 uniform_store 2, i16, xy, 0, r104l_r104h, 70 314: c5f0803d04823000 uniform_store 2, i16, xy, 0, r94l_r94h, 72 31c: c540a03d04833000 uniform_store 2, i16, xy, 0, r104l_r104h, 74 324: c5e0c03d04823000 uniform_store 2, i16, xy, 0, r92l_r92h, 76 32c: c540e03d04833000 uniform_store 2, i16, xy, 0, r104l_r104h, 78 334: c5d0003d05823000 uniform_store 2, i16, xy, 0, r90l_r90h, 80 33c: c540203d05833000 uniform_store 2, i16, xy, 0, r104l_r104h, 82 344: c5c0403d05823000 uniform_store 2, i16, xy, 0, r88l_r88h, 84 34c: c540603d05833000 uniform_store 2, i16, xy, 0, r104l_r104h, 86 354: c5b0803d05823000 uniform_store 2, i16, xy, 0, r86l_r86h, 88 35c: c540a03d05833000 uniform_store 2, i16, xy, 0, r104l_r104h, 90 364: c5a0c03d05823000 uniform_store 2, i16, xy, 0, r84l_r84h, 92 36c: c540e03d05833000 uniform_store 2, i16, xy, 0, r104l_r104h, 94 374: c590003d06823000 uniform_store 2, i16, xy, 0, r82l_r82h, 96 37c: c540203d06833000 uniform_store 2, i16, xy, 0, r104l_r104h, 98 384: c580403d06823000 uniform_store 2, i16, xy, 0, r80l_r80h, 100 38c: c540603d06833000 uniform_store 2, i16, xy, 0, r104l_r104h, 102 394: c570803d06823000 uniform_store 2, i16, xy, 0, r78l_r78h, 104 39c: c540a03d06833000 uniform_store 2, i16, xy, 0, r104l_r104h, 106 3a4: c560c03d06823000 uniform_store 2, i16, xy, 0, r76l_r76h, 108 3ac: c540e03d06833000 uniform_store 2, i16, xy, 0, r104l_r104h, 110 3b4: c550003d07823000 uniform_store 2, i16, xy, 0, r74l_r74h, 112 3bc: c540203d07833000 uniform_store 2, i16, xy, 0, r104l_r104h, 114 3c4: c540403d07823000 uniform_store 2, i16, xy, 0, r72l_r72h, 116 3cc: c540603d07833000 uniform_store 2, i16, xy, 0, r104l_r104h, 118 3d4: c530803d07823000 uniform_store 2, i16, xy, 0, r70l_r70h, 120 3dc: c540a03d07833000 uniform_store 2, i16, xy, 0, r104l_r104h, 122 3e4: c520c03d07823000 uniform_store 2, i16, xy, 0, r68l_r68h, 124 3ec: c540e03d07833000 uniform_store 2, i16, xy, 0, r104l_r104h, 126 3f4: c510003d08823000 uniform_store 2, i16, xy, 0, r66l_r66h, 128 3fc: c540203d08833000 uniform_store 2, i16, xy, 0, r104l_r104h, 130 404: c500403d08823000 uniform_store 2, i16, xy, 0, r64l_r64h, 132 40c: c540603d08833000 uniform_store 2, i16, xy, 0, r104l_r104h, 134 414: c5f0803d08813000 uniform_store 2, i16, xy, 0, r62l_r62h, 136 41c: c540a03d08833000 uniform_store 2, i16, xy, 0, r104l_r104h, 138 424: c5e0c03d08813000 uniform_store 2, i16, xy, 0, r60l_r60h, 140 42c: c540e03d08833000 uniform_store 2, i16, xy, 0, r104l_r104h, 142 434: c5d0003d09813000 uniform_store 2, i16, xy, 0, r58l_r58h, 144 43c: c540203d09833000 uniform_store 2, i16, xy, 0, r104l_r104h, 146 444: c5c0403d09813000 uniform_store 2, i16, xy, 0, r56l_r56h, 148 44c: c540603d09833000 uniform_store 2, i16, xy, 0, r104l_r104h, 150 454: c5b0803d09813000 uniform_store 2, i16, xy, 0, r54l_r54h, 152 45c: c540a03d09833000 uniform_store 2, i16, xy, 0, r104l_r104h, 154 464: c5a0c03d09813000 uniform_store 2, i16, xy, 0, r52l_r52h, 156 46c: c540e03d09833000 uniform_store 2, i16, xy, 0, r104l_r104h, 158 474: c590003d0a813000 uniform_store 2, i16, xy, 0, r50l_r50h, 160 47c: c540203d0a833000 uniform_store 2, i16, xy, 0, r104l_r104h, 162 484: c580403d0a813000 uniform_store 2, i16, xy, 0, r48l_r48h, 164 48c: c540603d0a833000 uniform_store 2, i16, xy, 0, r104l_r104h, 166 494: c570803d0a813000 uniform_store 2, i16, xy, 0, r46l_r46h, 168 49c: c540a03d0a833000 uniform_store 2, i16, xy, 0, r104l_r104h, 170 4a4: c560c03d0a813000 uniform_store 2, i16, xy, 0, r44l_r44h, 172 4ac: c540e03d0a833000 uniform_store 2, i16, xy, 0, r104l_r104h, 174 4b4: c550003d0b813000 uniform_store 2, i16, xy, 0, r42l_r42h, 176 4bc: c540203d0b833000 uniform_store 2, i16, xy, 0, r104l_r104h, 178 4c4: c540403d0b813000 uniform_store 2, i16, xy, 0, r40l_r40h, 180 4cc: c540603d0b833000 uniform_store 2, i16, xy, 0, r104l_r104h, 182 4d4: c530803d0b813000 uniform_store 2, i16, xy, 0, r38l_r38h, 184 4dc: c540a03d0b833000 uniform_store 2, i16, xy, 0, r104l_r104h, 186 4e4: c520c03d0b813000 uniform_store 2, i16, xy, 0, r36l_r36h, 188 4ec: c540e03d0b833000 uniform_store 2, i16, xy, 0, r104l_r104h, 190 4f4: c510003d0c813000 uniform_store 2, i16, xy, 0, r34l_r34h, 192 4fc: c540203d0c833000 uniform_store 2, i16, xy, 0, r104l_r104h, 194 504: c500403d0c813000 uniform_store 2, i16, xy, 0, r32l_r32h, 196 50c: c540603d0c833000 uniform_store 2, i16, xy, 0, r104l_r104h, 198 514: 3800 wait 0 516: c5f0803d0c803000 uniform_store 2, i16, xy, 0, r30l_r30h, 200 51e: c540a03d0c833000 uniform_store 2, i16, xy, 0, r104l_r104h, 202 526: c5e0c03d0c803000 uniform_store 2, i16, xy, 0, r28l_r28h, 204 52e: c540e03d0c833000 uniform_store 2, i16, xy, 0, r104l_r104h, 206 536: c5d0003d0d803000 uniform_store 2, i16, xy, 0, r26l_r26h, 208 53e: c540203d0d833000 uniform_store 2, i16, xy, 0, r104l_r104h, 210 546: c5c0403d0d803000 uniform_store 2, i16, xy, 0, r24l_r24h, 212 54e: c540603d0d833000 uniform_store 2, i16, xy, 0, r104l_r104h, 214 556: c5b0803d0d803000 uniform_store 2, i16, xy, 0, r22l_r22h, 216 55e: c540a03d0d833000 uniform_store 2, i16, xy, 0, r104l_r104h, 218 566: c5a0c03d0d803000 uniform_store 2, i16, xy, 0, r20l_r20h, 220 56e: c540e03d0d833000 uniform_store 2, i16, xy, 0, r104l_r104h, 222 576: c590003d0e803000 uniform_store 2, i16, xy, 0, r18l_r18h, 224 57e: c540203d0e833000 uniform_store 2, i16, xy, 0, r104l_r104h, 226 586: c580403d0e803000 uniform_store 2, i16, xy, 0, r16l_r16h, 228 58e: c540603d0e833000 uniform_store 2, i16, xy, 0, r104l_r104h, 230 596: 3801 wait 1 598: c570803d0e803000 uniform_store 2, i16, xy, 0, r14l_r14h, 232 5a0: c540a03d0e833000 uniform_store 2, i16, xy, 0, r104l_r104h, 234 5a8: c560c03d0e803000 uniform_store 2, i16, xy, 0, r12l_r12h, 236 5b0: c540e03d0e833000 uniform_store 2, i16, xy, 0, r104l_r104h, 238 5b8: c550003d0f803000 uniform_store 2, i16, xy, 0, r10l_r10h, 240 5c0: c540203d0f833000 uniform_store 2, i16, xy, 0, r104l_r104h, 242 5c8: c540403d0f803000 uniform_store 2, i16, xy, 0, r8l_r8h, 244 5d0: c540603d0f833000 uniform_store 2, i16, xy, 0, r104l_r104h, 246 5d8: c530803d0f803000 uniform_store 2, i16, xy, 0, r6l_r6h, 248 5e0: c540a03d0f833000 uniform_store 2, i16, xy, 0, r104l_r104h, 250 5e8: c520c03d0f803000 uniform_store 2, i16, xy, 0, r4l_r4h, 252 5f0: c540e03d0f833000 uniform_store 2, i16, xy, 0, r104l_r104h, 254 5f8: c510003d00803001 uniform_store 2, i16, xy, 0, r2l_r2h, 256 600: c540203d00833001 uniform_store 2, i16, xy, 0, r104l_r104h, 258 608: c500403d00803001 uniform_store 2, i16, xy, 0, r0l_r0h, 260 610: c540603d00833001 uniform_store 2, i16, xy, 0, r104l_r104h, 262 618: 8800 stop compute shader: 0: f20a3100 get_sr r2h.cache, sr49 (thread_position_in_threadgroup.y) 4: f2083000 get_sr r2l.cache, sr48 (thread_position_in_threadgroup.x) 8: 7e0445088000 mov r1l, r2h e: 7e0644088000 mov r1h, r2l 14: 621500000000 mov_imm r5, 0 1a: 62190000803f mov_imm r6, 1065353216 20: 621d00000040 mov_imm r7, 1073741824 26: 622100004040 mov_imm r8, 1077936128 2c: 09150402fc048000 st_tile r5_r6_r7_r8, i32, 1, xyzw, 64, 1, 4 34: 6800 threadgroup_barrier 36: e2000000 mov_imm r0l.cache, 0 3a: 5288c2020000 if_icmp r0l, seq, r1.discard, 0, 1 40: 20c016040000 jmp_exec_none 0x456 46: 62060000 mov_imm r1h, 0 4a: 620a0000 mov_imm r2h, 0 4e: 72041004 get_sr r1l, sr80 (thread_position_in_grid.x) 52: 72081104 get_sr r2l, sr81 (thread_position_in_grid.y) 56: 7e0dc4098000 mov r3, u130 5c: 7e15c0098000 mov r5, u128 62: 7e19bc09800c mov r6, u126 68: 7e1db809800c mov r7, u124 6e: 7e21b409800c mov r8, u122 74: 7e25b009800c mov r9, u120 7a: 7e29ac09800c mov r10, u118 80: 7e2da809800c mov r11, u116 86: 7e31a409800c mov r12, u114 8c: 7e35a009800c mov r13, u112 92: 7e399c09800c mov r14, u110 98: 7e3d9809800c mov r15, u108 9e: 7e419409800c mov r16, u106 a4: 7e459009800c mov r17, u104 aa: 7e498c09800c mov r18, u102 b0: 7e4d8809800c mov r19, u100 b6: 7e518409800c mov r20, u98 bc: 7e558009800c mov r21, u96 c2: 7e59bc098008 mov r22, u94 c8: 7e5db8098008 mov r23, u92 ce: 7e61b4098008 mov r24, u90 d4: 7e65b0098008 mov r25, u88 da: 7e69ac098008 mov r26, u86 e0: 7e6da8098008 mov r27, u84 e6: 7e71a4098008 mov r28, u82 ec: 7e75a0098008 mov r29, u80 f2: 7e799c098008 mov r30, u78 f8: 7e7d98098008 mov r31, u76 fe: 7e0194098018 mov r32, u74 104: 7e0590098018 mov r33, u72 10a: 7e098c098018 mov r34, u70 110: 7e0d88098018 mov r35, u68 116: 7e1184098018 mov r36, u66 11c: 7e1580098018 mov r37, u64 122: 7e19bc098014 mov r38, u62 128: 7e1db8098014 mov r39, u60 12e: 7e21b4098014 mov r40, u58 134: 7e25b0098014 mov r41, u56 13a: 7e29ac098014 mov r42, u54 140: 7e2da8098014 mov r43, u52 146: 7e31a4098014 mov r44, u50 14c: 7e35a0098014 mov r45, u48 152: 7e399c098014 mov r46, u46 158: 7e3d98098014 mov r47, u44 15e: 7e4194098014 mov r48, u42 164: 7e4590098014 mov r49, u40 16a: 7e498c098014 mov r50, u38 170: 7e4d88098014 mov r51, u36 176: 7e5184098014 mov r52, u34 17c: 7e5580098014 mov r53, u32 182: 7e59bc098010 mov r54, u30 188: 7e5db8098010 mov r55, u28 18e: 7e61b4098010 mov r56, u26 194: 7e65b0098010 mov r57, u24 19a: 7e69ac098010 mov r58, u22 1a0: 7e6da8098010 mov r59, u20 1a6: 7e71a4098010 mov r60, u18 1ac: 7e75a0098010 mov r61, u16 1b2: 7e799c098010 mov r62, u14 1b8: 7e7d98098010 mov r63, u12 1be: 7e0194098020 mov r64, u10 1c4: 7e0590098020 mov r65, u8 1ca: 7e098c098020 mov r66, u6 1d0: 7e0d88098020 mov r67, u4 1d6: b1808280c64a20800100 image_write_block r0l, 130, 1, r67, 0, 37, i32, 0 1e0: b1808280c44a20800100 image_write_block r0l, 130, 1, r66, 0, 37, i32, 0 1ea: b1808280c24a20800100 image_write_block r0l, 130, 1, r65, 0, 37, i32, 0 1f4: b1808280c04a20800100 image_write_block r0l, 130, 1, r64, 0, 37, i32, 0 1fe: b1808280fe4a20400100 image_write_block r0l, 130, 1, r63, 0, 37, i32, 0 208: b1808280fc4a20400100 image_write_block r0l, 130, 1, r62, 0, 37, i32, 0 212: b1808280fa4a20400100 image_write_block r0l, 130, 1, r61, 0, 37, i32, 0 21c: b1808280f84a20400100 image_write_block r0l, 130, 1, r60, 0, 37, i32, 0 226: b1808280f64a20400100 image_write_block r0l, 130, 1, r59, 0, 37, i32, 0 230: b1808280f44a20400100 image_write_block r0l, 130, 1, r58, 0, 37, i32, 0 23a: b1808280f24a20400100 image_write_block r0l, 130, 1, r57, 0, 37, i32, 0 244: b1808280f04a20400100 image_write_block r0l, 130, 1, r56, 0, 37, i32, 0 24e: b1808280ee4a20400100 image_write_block r0l, 130, 1, r55, 0, 37, i32, 0 258: b1808280ec4a20400100 image_write_block r0l, 130, 1, r54, 0, 37, i32, 0 262: b1808280ea4a20400100 image_write_block r0l, 130, 1, r53, 0, 37, i32, 0 26c: b1808280e84a20400100 image_write_block r0l, 130, 1, r52, 0, 37, i32, 0 276: b1808280e64a20400100 image_write_block r0l, 130, 1, r51, 0, 37, i32, 0 280: b1808280e44a20400100 image_write_block r0l, 130, 1, r50, 0, 37, i32, 0 28a: b1808280e24a20400100 image_write_block r0l, 130, 1, r49, 0, 37, i32, 0 294: b1808280e04a20400100 image_write_block r0l, 130, 1, r48, 0, 37, i32, 0 29e: b1808280de4a20400100 image_write_block r0l, 130, 1, r47, 0, 37, i32, 0 2a8: b1808280dc4a20400100 image_write_block r0l, 130, 1, r46, 0, 37, i32, 0 2b2: b1808280da4a20400100 image_write_block r0l, 130, 1, r45, 0, 37, i32, 0 2bc: b1808280d84a20400100 image_write_block r0l, 130, 1, r44, 0, 37, i32, 0 2c6: b1808280d64a20400100 image_write_block r0l, 130, 1, r43, 0, 37, i32, 0 2d0: b1808280d44a20400100 image_write_block r0l, 130, 1, r42, 0, 37, i32, 0 2da: b1808280d24a20400100 image_write_block r0l, 130, 1, r41, 0, 37, i32, 0 2e4: b1808280d04a20400100 image_write_block r0l, 130, 1, r40, 0, 37, i32, 0 2ee: b1808280ce4a20400100 image_write_block r0l, 130, 1, r39, 0, 37, i32, 0 2f8: b1808280cc4a20400100 image_write_block r0l, 130, 1, r38, 0, 37, i32, 0 302: b1808280ca4a20400100 image_write_block r0l, 130, 1, r37, 0, 37, i32, 0 30c: b1808280c84a20400100 image_write_block r0l, 130, 1, r36, 0, 37, i32, 0 316: b1808280c64a20400100 image_write_block r0l, 130, 1, r35, 0, 37, i32, 0 320: b1808280c44a20400100 image_write_block r0l, 130, 1, r34, 0, 37, i32, 0 32a: b1808280c24a20400100 image_write_block r0l, 130, 1, r33, 0, 37, i32, 0 334: b1808280c04a20400100 image_write_block r0l, 130, 1, r32, 0, 37, i32, 0 33e: b1808280fe4a20000100 image_write_block r0l, 130, 1, r31, 0, 37, i32, 0 348: b1808280fc4a20000100 image_write_block r0l, 130, 1, r30, 0, 37, i32, 0 352: b1808280fa4a20000100 image_write_block r0l, 130, 1, r29, 0, 37, i32, 0 35c: b1808280f84a20000100 image_write_block r0l, 130, 1, r28, 0, 37, i32, 0 366: b1808280f64a20000100 image_write_block r0l, 130, 1, r27, 0, 37, i32, 0 370: b1808280f44a20000100 image_write_block r0l, 130, 1, r26, 0, 37, i32, 0 37a: b1808280f24a20000100 image_write_block r0l, 130, 1, r25, 0, 37, i32, 0 384: b1808280f04a20000100 image_write_block r0l, 130, 1, r24, 0, 37, i32, 0 38e: b1808280ee4a20000100 image_write_block r0l, 130, 1, r23, 0, 37, i32, 0 398: b1808280ec4a20000100 image_write_block r0l, 130, 1, r22, 0, 37, i32, 0 3a2: b1808280ea4a20000100 image_write_block r0l, 130, 1, r21, 0, 37, i32, 0 3ac: b1808280e84a20000100 image_write_block r0l, 130, 1, r20, 0, 37, i32, 0 3b6: b1808280e64a20000100 image_write_block r0l, 130, 1, r19, 0, 37, i32, 0 3c0: b1808280e44a20000100 image_write_block r0l, 130, 1, r18, 0, 37, i32, 0 3ca: b1808280e24a20000100 image_write_block r0l, 130, 1, r17, 0, 37, i32, 0 3d4: b1808280e04a20000100 image_write_block r0l, 130, 1, r16, 0, 37, i32, 0 3de: b1808280de4a20000100 image_write_block r0l, 130, 1, r15, 0, 37, i32, 0 3e8: b1808280dc4a20000100 image_write_block r0l, 130, 1, r14, 0, 37, i32, 0 3f2: b1808280da4a20000100 image_write_block r0l, 130, 1, r13, 0, 37, i32, 0 3fc: b1808280d84a20000100 image_write_block r0l, 130, 1, r12, 0, 37, i32, 0 406: b1808280d64a20000100 image_write_block r0l, 130, 1, r11, 0, 37, i32, 0 410: b1808280d44a20000100 image_write_block r0l, 130, 1, r10, 0, 37, i32, 0 41a: b1808280d24a20000100 image_write_block r0l, 130, 1, r9, 0, 37, i32, 0 424: b1808280d04a20000100 image_write_block r0l, 130, 1, r8, 0, 37, i32, 0 42e: b1808280ce4a20000100 image_write_block r0l, 130, 1, r7, 0, 37, i32, 0 438: b1808280cc4a20000100 image_write_block r0l, 130, 1, r6, 0, 37, i32, 0 442: b1808280ca4a20000100 image_write_block r0l, 130, 1, r5, 0, 37, i32, 0 44c: b1808280c64a20000100 image_write_block r0l, 130, 1, r3, 0, 37, i32, 0 456: 520e00000000 pop_exec r0l, 1 45c: 8800 stop ```

Looks like there's a CoordsDesc and TEX_TYPES in there too 0001-imageblock-write-coordsdesc-and-tex_type.patch (note: I haven't confirmed the flags bit on that)

Writing imageblocks to texturecube_arrays ```metal using namespace metal; struct Test { float4 yay; }; kernel void test(metal::imageblock f, texturecube_array tex, ushort2 lid [[thread_position_in_threadgroup]], ushort2 gid [[thread_position_in_grid]]) { threadgroup_imageblock Test* t = f.data(lid); t->yay = float4(0, 1, 2, 3); threadgroup_barrier(mem_flags::mem_threadgroup_imageblock); if (all(lid == 0)) { imageblock_slice slice = f.slice(t->yay); for (int i = 0; i < 64; i++) { tex.write(slice, gid + ushort2(0, i), 47, 48); } } } ``` ``` compute shader: 0: f20a3100 get_sr r2h.cache, sr49 (thread_position_in_threadgroup.y) 4: f2083000 get_sr r2l.cache, sr48 (thread_position_in_threadgroup.x) 8: 7e0445088000 mov r1l, r2h e: 7e0644088000 mov r1h, r2l 14: 621500000000 mov_imm r5, 0 1a: 62190000803f mov_imm r6, 1065353216 20: 621d00000040 mov_imm r7, 1073741824 26: 622100004040 mov_imm r8, 1077936128 2c: 09150402fc048000 st_tile r5_r6_r7_r8, i32, 1, xyzw, 64, 1, 4 34: 6800 threadgroup_barrier 36: e2000000 mov_imm r0l.cache, 0 3a: 5288c2020000 if_icmp r0l, seq, r1.discard, 0, 1 40: 20c0b6080000 jmp_exec_none 0x8F6 46: e2060000 mov_imm r1h.cache, 0 4a: 620a0000 mov_imm r2h, 0 4e: 620e0000 mov_imm r3h, 0 52: f2081104 get_sr r2l.cache, sr81 (thread_position_in_grid.y) 56: f2041004 get_sr r1l.cache, sr80 (thread_position_in_grid.x) 5a: 0e4c014008000000 iadd r19l, 1, r2l.cache 62: 0e38024008000000 iadd r14l, 2, r2l.cache 6a: 0e28034008000000 iadd r10l, 3, r2l.cache 72: 0e18044004000000 iadd r6l, 4, r2l 7a: 7e4882088000 mov r18l, r1l.cache 80: 7e3482088000 mov r13l, r1l.cache 86: 7e2482088000 mov r9l, r1l.cache 8c: 620c3000 mov_imm r3l, 48 90: 7e1442088000 mov r5l, r1l 96: 62503000 mov_imm r20l, 48 9a: 623c3000 mov_imm r15l, 48 9e: 622c3000 mov_imm r11l, 48 a2: 621c3000 mov_imm r7l, 48 a6: 7e4a83088000 mov r18h, r1h.cache ac: 7e4e83088000 mov r19h, r1h.cache b2: 7e5283088000 mov r20h, r1h.cache b8: 7e3683088000 mov r13h, r1h.cache be: 7e3a83088000 mov r14h, r1h.cache c4: 7e3e83088000 mov r15h, r1h.cache ca: 7e2683088000 mov r9h, r1h.cache d0: 7e2a83088000 mov r10h, r1h.cache d6: 7e2e83088000 mov r11h, r1h.cache dc: 7e1683088000 mov r5h, r1h.cache e2: 7e1a83088000 mov r6h, r1h.cache e8: 62102f00 mov_imm r4l, 47 ec: 7e1e43088000 mov r7h, r1h f2: 62542f00 mov_imm r21l, 47 f6: 62402f00 mov_imm r16l, 47 fa: 62302f00 mov_imm r12l, 47 fe: 62202f00 mov_imm r8l, 47 102: b1808280004f20000100 image_write_block r0l, r1l_r1h_r2l_r2h_r3l_r3h_r4l, 1, 1, ts0, tex_cube_array, 9, i32, 0 10c: b180a480004f20000100 image_write_block r0l, r18l_r18h_r19l_r19h_r20l_r20h_r21l, 1, 1, ts0, tex_cube_array, 9, i32, 0 116: b1809a80004f20000100 image_write_block r0l, r13l_r13h_r14l_r14h_r15l_r15h_r16l, 1, 1, ts0, tex_cube_array, 9, i32, 0 120: b1809280004f20000100 image_write_block r0l, r9l_r9h_r10l_r10h_r11l_r11h_r12l, 1, 1, ts0, tex_cube_array, 9, i32, 0 12a: b1808a80004f20000100 image_write_block r0l, r5l_r5h_r6l_r6h_r7l_r7h_r8l, 1, 1, ts0, tex_cube_array, 9, i32, 0 134: 0e58054008300000 iadd r118l, 5, r2l.cache 13c: 0e48064008300000 iadd r114l, 6, r2l.cache 144: 0e38074008300000 iadd r110l, 7, r2l.cache 14c: 0e28084008300000 iadd r106l, 8, r2l.cache 154: 0e18094008300000 iadd r102l, 9, r2l.cache 15c: 0e080a4008300000 iadd r98l, 10, r2l.cache 164: 0e780b4008200000 iadd r94l, 11, r2l.cache 16c: 0e680c4008200000 iadd r90l, 12, r2l.cache 174: 0e580d4008200000 iadd r86l, 13, r2l.cache 17c: 0e480e4008200000 iadd r82l, 14, r2l.cache 184: 0e380f4008200000 iadd r78l, 15, r2l.cache 18c: 0e28104008200000 iadd r74l, 16, r2l.cache 194: 0e18114008200000 iadd r70l, 17, r2l.cache 19c: 0e08124008200000 iadd r66l, 18, r2l.cache 1a4: 0e78134008100000 iadd r62l, 19, r2l.cache 1ac: 0e68144008100000 iadd r58l, 20, r2l.cache 1b4: 0e58154008100000 iadd r54l, 21, r2l.cache 1bc: 0e48164008100000 iadd r50l, 22, r2l.cache 1c4: 0e38174008100000 iadd r46l, 23, r2l.cache 1cc: 0e28184008100000 iadd r42l, 24, r2l.cache 1d4: 0e18194008100000 iadd r38l, 25, r2l.cache 1dc: 0e081a4008100000 iadd r34l, 26, r2l.cache 1e4: 0e781b4008000000 iadd r30l, 27, r2l.cache 1ec: 0e681c4008000000 iadd r26l, 28, r2l.cache 1f4: 0e581d4008000000 iadd r22l, 29, r2l.cache 1fc: 0e481e4008000000 iadd r18l, 30, r2l.cache 204: 0e381f4008000000 iadd r14l, 31, r2l.cache 20c: 0e28204008000000 iadd r10l, 32, r2l.cache 214: 0e18214004000000 iadd r6l, 33, r2l 21c: 7e5482088030 mov r117l, r1l.cache 222: 7e4482088030 mov r113l, r1l.cache 228: 7e3482088030 mov r109l, r1l.cache 22e: 7e2482088030 mov r105l, r1l.cache 234: 7e1482088030 mov r101l, r1l.cache 23a: 7e0482088030 mov r97l, r1l.cache 240: 7e7482088020 mov r93l, r1l.cache 246: 7e6482088020 mov r89l, r1l.cache 24c: 7e5482088020 mov r85l, r1l.cache 252: 7e4482088020 mov r81l, r1l.cache 258: 7e3482088020 mov r77l, r1l.cache 25e: 7e2482088020 mov r73l, r1l.cache 264: 7e1482088020 mov r69l, r1l.cache 26a: 7e0482088020 mov r65l, r1l.cache 270: 7e7482088010 mov r61l, r1l.cache 276: 7e6482088010 mov r57l, r1l.cache 27c: 7e5482088010 mov r53l, r1l.cache 282: 7e4482088010 mov r49l, r1l.cache 288: 7e3482088010 mov r45l, r1l.cache 28e: 7e2482088010 mov r41l, r1l.cache 294: 7e1482088010 mov r37l, r1l.cache 29a: 7e0482088010 mov r33l, r1l.cache 2a0: 7e7482088000 mov r29l, r1l.cache 2a6: 7e6482088000 mov r25l, r1l.cache 2ac: 7e5482088000 mov r21l, r1l.cache 2b2: 7e4442088000 mov r17l, r1l 2b8: 7e5c86088030 mov r119l, r3l.cache 2be: 7e4c86088030 mov r115l, r3l.cache 2c4: 7e3c86088030 mov r111l, r3l.cache 2ca: 7e2c86088030 mov r107l, r3l.cache 2d0: 7e1c86088030 mov r103l, r3l.cache 2d6: 7e0c86088030 mov r99l, r3l.cache 2dc: 7e7c86088020 mov r95l, r3l.cache 2e2: 7e6c86088020 mov r91l, r3l.cache 2e8: 7e5c86088020 mov r87l, r3l.cache 2ee: 7e4c86088020 mov r83l, r3l.cache 2f4: 7e3c86088020 mov r79l, r3l.cache 2fa: 7e2c86088020 mov r75l, r3l.cache 300: 7e1c86088020 mov r71l, r3l.cache 306: 7e0c86088020 mov r67l, r3l.cache 30c: 7e7c86088010 mov r63l, r3l.cache 312: 7e6c86088010 mov r59l, r3l.cache 318: 7e5c86088010 mov r55l, r3l.cache 31e: 7e4c86088010 mov r51l, r3l.cache 324: 7e3c86088010 mov r47l, r3l.cache 32a: 7e2c86088010 mov r43l, r3l.cache 330: 7e1c86088010 mov r39l, r3l.cache 336: 7e0c86088010 mov r35l, r3l.cache 33c: 7e7c86088000 mov r31l, r3l.cache 342: 7e6c86088000 mov r27l, r3l.cache 348: 7e5c86088000 mov r23l, r3l.cache 34e: 7e4c46088000 mov r19l, r3l 354: 7e5683088030 mov r117h, r1h.cache 35a: 7e5a83088030 mov r118h, r1h.cache 360: 7e5e83088030 mov r119h, r1h.cache 366: 7e4683088030 mov r113h, r1h.cache 36c: 7e4a83088030 mov r114h, r1h.cache 372: 7e4e83088030 mov r115h, r1h.cache 378: 7e3683088030 mov r109h, r1h.cache 37e: 7e3a83088030 mov r110h, r1h.cache 384: 7e3e83088030 mov r111h, r1h.cache 38a: 7e2683088030 mov r105h, r1h.cache 390: 7e2a83088030 mov r106h, r1h.cache 396: 7e2e83088030 mov r107h, r1h.cache 39c: 7e1683088030 mov r101h, r1h.cache 3a2: 7e1a83088030 mov r102h, r1h.cache 3a8: 7e1e83088030 mov r103h, r1h.cache 3ae: 7e0683088030 mov r97h, r1h.cache 3b4: 7e0a83088030 mov r98h, r1h.cache 3ba: 7e0e83088030 mov r99h, r1h.cache 3c0: 7e7683088020 mov r93h, r1h.cache 3c6: 7e7a83088020 mov r94h, r1h.cache 3cc: 7e7e83088020 mov r95h, r1h.cache 3d2: 7e6683088020 mov r89h, r1h.cache 3d8: 7e6a83088020 mov r90h, r1h.cache 3de: 7e6e83088020 mov r91h, r1h.cache 3e4: 7e5683088020 mov r85h, r1h.cache 3ea: 7e5a83088020 mov r86h, r1h.cache 3f0: 7e5e83088020 mov r87h, r1h.cache 3f6: 7e4683088020 mov r81h, r1h.cache 3fc: 7e4a83088020 mov r82h, r1h.cache 402: 7e4e83088020 mov r83h, r1h.cache 408: 7e3683088020 mov r77h, r1h.cache 40e: 7e3a83088020 mov r78h, r1h.cache 414: 7e3e83088020 mov r79h, r1h.cache 41a: 7e2683088020 mov r73h, r1h.cache 420: 7e2a83088020 mov r74h, r1h.cache 426: 7e2e83088020 mov r75h, r1h.cache 42c: 7e1683088020 mov r69h, r1h.cache 432: 7e1a83088020 mov r70h, r1h.cache 438: 7e1e83088020 mov r71h, r1h.cache 43e: 7e0683088020 mov r65h, r1h.cache 444: 7e0a83088020 mov r66h, r1h.cache 44a: 7e0e83088020 mov r67h, r1h.cache 450: 7e7683088010 mov r61h, r1h.cache 456: 7e7a83088010 mov r62h, r1h.cache 45c: 7e7e83088010 mov r63h, r1h.cache 462: 7e6683088010 mov r57h, r1h.cache 468: 7e6a83088010 mov r58h, r1h.cache 46e: 7e6e83088010 mov r59h, r1h.cache 474: 7e5683088010 mov r53h, r1h.cache 47a: 7e5a83088010 mov r54h, r1h.cache 480: 7e5e83088010 mov r55h, r1h.cache 486: 7e4683088010 mov r49h, r1h.cache 48c: 7e4a83088010 mov r50h, r1h.cache 492: 7e4e83088010 mov r51h, r1h.cache 498: 7e3683088010 mov r45h, r1h.cache 49e: 7e3a83088010 mov r46h, r1h.cache 4a4: 7e3e83088010 mov r47h, r1h.cache 4aa: 7e2683088010 mov r41h, r1h.cache 4b0: 7e2a83088010 mov r42h, r1h.cache 4b6: 7e2e83088010 mov r43h, r1h.cache 4bc: 7e1683088010 mov r37h, r1h.cache 4c2: 7e1a83088010 mov r38h, r1h.cache 4c8: 7e1e83088010 mov r39h, r1h.cache 4ce: 7e0683088010 mov r33h, r1h.cache 4d4: 7e0a83088010 mov r34h, r1h.cache 4da: 7e0e83088010 mov r35h, r1h.cache 4e0: 7e7683088000 mov r29h, r1h.cache 4e6: 7e7a83088000 mov r30h, r1h.cache 4ec: 7e7e83088000 mov r31h, r1h.cache 4f2: 7e6683088000 mov r25h, r1h.cache 4f8: 7e6a83088000 mov r26h, r1h.cache 4fe: 7e6e83088000 mov r27h, r1h.cache 504: 7e5683088000 mov r21h, r1h.cache 50a: 7e5a83088000 mov r22h, r1h.cache 510: 7e5e83088000 mov r23h, r1h.cache 516: 7e4643088000 mov r17h, r1h 51c: 7e6088088030 mov r120l, r4l.cache 522: 7e5088088030 mov r116l, r4l.cache 528: 7e4088088030 mov r112l, r4l.cache 52e: 7e3088088030 mov r108l, r4l.cache 534: 7e2088088030 mov r104l, r4l.cache 53a: 7e1088088030 mov r100l, r4l.cache 540: 7e0088088030 mov r96l, r4l.cache 546: 7e7088088020 mov r92l, r4l.cache 54c: 7e6088088020 mov r88l, r4l.cache 552: 7e5088088020 mov r84l, r4l.cache 558: 7e4088088020 mov r80l, r4l.cache 55e: 7e3088088020 mov r76l, r4l.cache 564: 7e2088088020 mov r72l, r4l.cache 56a: 7e1088088020 mov r68l, r4l.cache 570: 7e0088088020 mov r64l, r4l.cache 576: 7e7088088010 mov r60l, r4l.cache 57c: 7e6088088010 mov r56l, r4l.cache 582: 7e5088088010 mov r52l, r4l.cache 588: 7e4088088010 mov r48l, r4l.cache 58e: 7e3088088010 mov r44l, r4l.cache 594: 7e2088088010 mov r40l, r4l.cache 59a: 7e1088088010 mov r36l, r4l.cache 5a0: 7e0088088010 mov r32l, r4l.cache 5a6: 7e7088088000 mov r28l, r4l.cache 5ac: 7e6088088000 mov r24l, r4l.cache 5b2: 7e5048088000 mov r20l, r4l 5b8: b180aa80004f200c0100 image_write_block r0l, r117l_r117h_r118l_r118h_r119l_r119h_r120l, 1, 1, ts0, tex_cube_array, 9, i32, 0 5c2: b180a280004f200c0100 image_write_block r0l, r113l_r113h_r114l_r114h_r115l_r115h_r116l, 1, 1, ts0, tex_cube_array, 9, i32, 0 5cc: b1809a80004f200c0100 image_write_block r0l, r109l_r109h_r110l_r110h_r111l_r111h_r112l, 1, 1, ts0, tex_cube_array, 9, i32, 0 5d6: b1809280004f200c0100 image_write_block r0l, r105l_r105h_r106l_r106h_r107l_r107h_r108l, 1, 1, ts0, tex_cube_array, 9, i32, 0 5e0: b1808a80004f200c0100 image_write_block r0l, r101l_r101h_r102l_r102h_r103l_r103h_r104l, 1, 1, ts0, tex_cube_array, 9, i32, 0 5ea: b1808280004f200c0100 image_write_block r0l, r97l_r97h_r98l_r98h_r99l_r99h_r100l, 1, 1, ts0, tex_cube_array, 9, i32, 0 5f4: b180ba80004f20080100 image_write_block r0l, r93l_r93h_r94l_r94h_r95l_r95h_r96l, 1, 1, ts0, tex_cube_array, 9, i32, 0 5fe: b180b280004f20080100 image_write_block r0l, r89l_r89h_r90l_r90h_r91l_r91h_r92l, 1, 1, ts0, tex_cube_array, 9, i32, 0 608: b180aa80004f20080100 image_write_block r0l, r85l_r85h_r86l_r86h_r87l_r87h_r88l, 1, 1, ts0, tex_cube_array, 9, i32, 0 612: b180a280004f20080100 image_write_block r0l, r81l_r81h_r82l_r82h_r83l_r83h_r84l, 1, 1, ts0, tex_cube_array, 9, i32, 0 61c: b1809a80004f20080100 image_write_block r0l, r77l_r77h_r78l_r78h_r79l_r79h_r80l, 1, 1, ts0, tex_cube_array, 9, i32, 0 626: b1809280004f20080100 image_write_block r0l, r73l_r73h_r74l_r74h_r75l_r75h_r76l, 1, 1, ts0, tex_cube_array, 9, i32, 0 630: b1808a80004f20080100 image_write_block r0l, r69l_r69h_r70l_r70h_r71l_r71h_r72l, 1, 1, ts0, tex_cube_array, 9, i32, 0 63a: b1808280004f20080100 image_write_block r0l, r65l_r65h_r66l_r66h_r67l_r67h_r68l, 1, 1, ts0, tex_cube_array, 9, i32, 0 644: b180ba80004f20040100 image_write_block r0l, r61l_r61h_r62l_r62h_r63l_r63h_r64l, 1, 1, ts0, tex_cube_array, 9, i32, 0 64e: b180b280004f20040100 image_write_block r0l, r57l_r57h_r58l_r58h_r59l_r59h_r60l, 1, 1, ts0, tex_cube_array, 9, i32, 0 658: b180aa80004f20040100 image_write_block r0l, r53l_r53h_r54l_r54h_r55l_r55h_r56l, 1, 1, ts0, tex_cube_array, 9, i32, 0 662: b180a280004f20040100 image_write_block r0l, r49l_r49h_r50l_r50h_r51l_r51h_r52l, 1, 1, ts0, tex_cube_array, 9, i32, 0 66c: b1809a80004f20040100 image_write_block r0l, r45l_r45h_r46l_r46h_r47l_r47h_r48l, 1, 1, ts0, tex_cube_array, 9, i32, 0 676: b1809280004f20040100 image_write_block r0l, r41l_r41h_r42l_r42h_r43l_r43h_r44l, 1, 1, ts0, tex_cube_array, 9, i32, 0 680: b1808a80004f20040100 image_write_block r0l, r37l_r37h_r38l_r38h_r39l_r39h_r40l, 1, 1, ts0, tex_cube_array, 9, i32, 0 68a: b1808280004f20040100 image_write_block r0l, r33l_r33h_r34l_r34h_r35l_r35h_r36l, 1, 1, ts0, tex_cube_array, 9, i32, 0 694: b180ba80004f20000100 image_write_block r0l, r29l_r29h_r30l_r30h_r31l_r31h_r32l, 1, 1, ts0, tex_cube_array, 9, i32, 0 69e: b180b280004f20000100 image_write_block r0l, r25l_r25h_r26l_r26h_r27l_r27h_r28l, 1, 1, ts0, tex_cube_array, 9, i32, 0 6a8: b180aa80004f20000100 image_write_block r0l, r21l_r21h_r22l_r22h_r23l_r23h_r24l, 1, 1, ts0, tex_cube_array, 9, i32, 0 6b2: b180a280004f20000100 image_write_block r0l, r17l_r17h_r18l_r18h_r19l_r19h_r20l, 1, 1, ts0, tex_cube_array, 9, i32, 0 6bc: b1809a80004f20000100 image_write_block r0l, r13l_r13h_r14l_r14h_r15l_r15h_r16l, 1, 1, ts0, tex_cube_array, 9, i32, 0 6c6: b1809280004f20000100 image_write_block r0l, r9l_r9h_r10l_r10h_r11l_r11h_r12l, 1, 1, ts0, tex_cube_array, 9, i32, 0 6d0: b1808a80004f20000100 image_write_block r0l, r5l_r5h_r6l_r6h_r7l_r7h_r8l, 1, 1, ts0, tex_cube_array, 9, i32, 0 6da: 0e58224008300000 iadd r118l, 34, r2l.cache 6e2: 0e48234008300000 iadd r114l, 35, r2l.cache 6ea: 0e38244008300000 iadd r110l, 36, r2l.cache 6f2: 0e28254008300000 iadd r106l, 37, r2l.cache 6fa: 0e18264008300000 iadd r102l, 38, r2l.cache 702: 0e08274008300000 iadd r98l, 39, r2l.cache 70a: 0e78284008200000 iadd r94l, 40, r2l.cache 712: 0e68294008200000 iadd r90l, 41, r2l.cache 71a: 0e582a4008200000 iadd r86l, 42, r2l.cache 722: 0e482b4008200000 iadd r82l, 43, r2l.cache 72a: 0e382c4008200000 iadd r78l, 44, r2l.cache 732: 0e282d4008200000 iadd r74l, 45, r2l.cache 73a: 0e182e4008200000 iadd r70l, 46, r2l.cache 742: 0e082f4008200000 iadd r66l, 47, r2l.cache 74a: 0e78304008100000 iadd r62l, 48, r2l.cache 752: 0e68314008100000 iadd r58l, 49, r2l.cache 75a: 0e58324008100000 iadd r54l, 50, r2l.cache 762: 0e48334008100000 iadd r50l, 51, r2l.cache 76a: 0e38344008100000 iadd r46l, 52, r2l.cache 772: 0e28354008100000 iadd r42l, 53, r2l.cache 77a: 0e18364008100000 iadd r38l, 54, r2l.cache 782: 0e08374008100000 iadd r34l, 55, r2l.cache 78a: 0e78384008000000 iadd r30l, 56, r2l.cache 792: 0e68394008000000 iadd r26l, 57, r2l.cache 79a: 0e583a4008000000 iadd r22l, 58, r2l.cache 7a2: 0e483b4008000000 iadd r18l, 59, r2l.cache 7aa: 0e383c4008000000 iadd r14l, 60, r2l.cache 7b2: 0e283d4008000000 iadd r10l, 61, r2l.cache 7ba: 0e183e4008000000 iadd r6l, 62, r2l.cache 7c2: 0e083f400c000000 iadd r2l, 63, r2l.discard 7ca: b180aa80004f200c0100 image_write_block r0l, r117l_r117h_r118l_r118h_r119l_r119h_r120l, 1, 1, ts0, tex_cube_array, 9, i32, 0 7d4: b180a280004f200c0100 image_write_block r0l, r113l_r113h_r114l_r114h_r115l_r115h_r116l, 1, 1, ts0, tex_cube_array, 9, i32, 0 7de: b1809a80004f200c0100 image_write_block r0l, r109l_r109h_r110l_r110h_r111l_r111h_r112l, 1, 1, ts0, tex_cube_array, 9, i32, 0 7e8: b1809280004f200c0100 image_write_block r0l, r105l_r105h_r106l_r106h_r107l_r107h_r108l, 1, 1, ts0, tex_cube_array, 9, i32, 0 7f2: b1808a80004f200c0100 image_write_block r0l, r101l_r101h_r102l_r102h_r103l_r103h_r104l, 1, 1, ts0, tex_cube_array, 9, i32, 0 7fc: b1808280004f200c0100 image_write_block r0l, r97l_r97h_r98l_r98h_r99l_r99h_r100l, 1, 1, ts0, tex_cube_array, 9, i32, 0 806: b180ba80004f20080100 image_write_block r0l, r93l_r93h_r94l_r94h_r95l_r95h_r96l, 1, 1, ts0, tex_cube_array, 9, i32, 0 810: b180b280004f20080100 image_write_block r0l, r89l_r89h_r90l_r90h_r91l_r91h_r92l, 1, 1, ts0, tex_cube_array, 9, i32, 0 81a: b180aa80004f20080100 image_write_block r0l, r85l_r85h_r86l_r86h_r87l_r87h_r88l, 1, 1, ts0, tex_cube_array, 9, i32, 0 824: b180a280004f20080100 image_write_block r0l, r81l_r81h_r82l_r82h_r83l_r83h_r84l, 1, 1, ts0, tex_cube_array, 9, i32, 0 82e: b1809a80004f20080100 image_write_block r0l, r77l_r77h_r78l_r78h_r79l_r79h_r80l, 1, 1, ts0, tex_cube_array, 9, i32, 0 838: b1809280004f20080100 image_write_block r0l, r73l_r73h_r74l_r74h_r75l_r75h_r76l, 1, 1, ts0, tex_cube_array, 9, i32, 0 842: b1808a80004f20080100 image_write_block r0l, r69l_r69h_r70l_r70h_r71l_r71h_r72l, 1, 1, ts0, tex_cube_array, 9, i32, 0 84c: b1808280004f20080100 image_write_block r0l, r65l_r65h_r66l_r66h_r67l_r67h_r68l, 1, 1, ts0, tex_cube_array, 9, i32, 0 856: b180ba80004f20040100 image_write_block r0l, r61l_r61h_r62l_r62h_r63l_r63h_r64l, 1, 1, ts0, tex_cube_array, 9, i32, 0 860: b180b280004f20040100 image_write_block r0l, r57l_r57h_r58l_r58h_r59l_r59h_r60l, 1, 1, ts0, tex_cube_array, 9, i32, 0 86a: b180aa80004f20040100 image_write_block r0l, r53l_r53h_r54l_r54h_r55l_r55h_r56l, 1, 1, ts0, tex_cube_array, 9, i32, 0 874: b180a280004f20040100 image_write_block r0l, r49l_r49h_r50l_r50h_r51l_r51h_r52l, 1, 1, ts0, tex_cube_array, 9, i32, 0 87e: b1809a80004f20040100 image_write_block r0l, r45l_r45h_r46l_r46h_r47l_r47h_r48l, 1, 1, ts0, tex_cube_array, 9, i32, 0 888: b1809280004f20040100 image_write_block r0l, r41l_r41h_r42l_r42h_r43l_r43h_r44l, 1, 1, ts0, tex_cube_array, 9, i32, 0 892: b1808a80004f20040100 image_write_block r0l, r37l_r37h_r38l_r38h_r39l_r39h_r40l, 1, 1, ts0, tex_cube_array, 9, i32, 0 89c: b1808280004f20040100 image_write_block r0l, r33l_r33h_r34l_r34h_r35l_r35h_r36l, 1, 1, ts0, tex_cube_array, 9, i32, 0 8a6: b180ba80004f20000100 image_write_block r0l, r29l_r29h_r30l_r30h_r31l_r31h_r32l, 1, 1, ts0, tex_cube_array, 9, i32, 0 8b0: b180b280004f20000100 image_write_block r0l, r25l_r25h_r26l_r26h_r27l_r27h_r28l, 1, 1, ts0, tex_cube_array, 9, i32, 0 8ba: b180aa80004f20000100 image_write_block r0l, r21l_r21h_r22l_r22h_r23l_r23h_r24l, 1, 1, ts0, tex_cube_array, 9, i32, 0 8c4: b180a280004f20000100 image_write_block r0l, r17l_r17h_r18l_r18h_r19l_r19h_r20l, 1, 1, ts0, tex_cube_array, 9, i32, 0 8ce: b1809a80004f20000100 image_write_block r0l, r13l_r13h_r14l_r14h_r15l_r15h_r16l, 1, 1, ts0, tex_cube_array, 9, i32, 0 8d8: b1809280004f20000100 image_write_block r0l, r9l_r9h_r10l_r10h_r11l_r11h_r12l, 1, 1, ts0, tex_cube_array, 9, i32, 0 8e2: b1808a80004f20000100 image_write_block r0l, r5l_r5h_r6l_r6h_r7l_r7h_r8l, 1, 1, ts0, tex_cube_array, 9, i32, 0 8ec: b1808280004f20000100 image_write_block r0l, r1l_r1h_r2l_r2h_r3l_r3h_r4l, 1, 1, ts0, tex_cube_array, 9, i32, 0 8f6: 520e00000000 pop_exec r0l, 1 8fc: 8800 stop ```

Want to make sure these match up with what you're seeing in end-of-tile programs?

TellowKrinkle commented 1 year ago

Metal's headers pass __METAL_TEXTURE_WRITE_ROUNDING_MODE__ to their internal imageblock write functions, and indeed -ftexture-write-rounding-mode=rte flips bit 53. 0 => rte, 1 => rtz. Hijacking their builtin to pass values other than __METAL_TEXTURE_WRITE_ROUNDING_MODE__ results in rtz for all non-1 inputs (yes, the parameter passed to the builtin is the opposite of the one in the instruction).

They also have a lod input (not sure what it's for on a non-ms texture2d Edit I'm dumb, ms != mipmap), which seems to correspond a register defined by bits 24:29, 60:61 with bit 31 flipping lod between a 16-bit register (off) and an immediate (on). Weirdly immediates 256-511 overflow into bit 30, but that might just be them not expecting such large values, as 512 overflows to an instruction identical to 0.

Fun with the lod parameter ```metal using namespace metal; struct Test { float4 a; }; kernel void test(metal::imageblock f, texture2d tex, ushort2 lid [[thread_position_in_threadgroup]], ushort2 gid [[thread_position_in_grid]], constant uint* fun) { threadgroup_imageblock Test* t = f.data(lid); t->a = float4(0, 1, 2, 3); threadgroup_barrier(mem_flags::mem_threadgroup_imageblock); if (all(lid == 0)) { for (int i = 0; i < 64; i++) { imageblock_slice slice = f.slice(t->a); tex.write(slice, gid, i * 16); } } } ``` (I didn't wire up bit 30, so enjoy some slightly broken decompilation) ``` compute shader: 0: f20a3100 get_sr r2h.cache, sr49 (thread_position_in_threadgroup.y) 4: f2083000 get_sr r2l.cache, sr48 (thread_position_in_threadgroup.x) 8: 7e0445088000 mov r1l, r2h e: 7e0644088000 mov r1h, r2l 14: 621500000000 mov_imm r5, 0 1a: 62190000803f mov_imm r6, 1065353216 20: 621d00000040 mov_imm r7, 1073741824 26: 622100004040 mov_imm r8, 1077936128 2c: 09150402fc048000 st_tile r5_r6_r7_r8, i32, 1, xyzw, 64, 1, 4 34: 6800 threadgroup_barrier 36: e2000000 mov_imm r0l.cache, 0 3a: 5288c2020000 if_icmp r0l, seq, r1.discard, 0, 1 40: 62060000 mov_imm r1h, 0 44: 620a0000 mov_imm r2h, 0 48: 72041004 get_sr r1l, sr80 (thread_position_in_grid.x) 4c: 72081104 get_sr r2l, sr81 (thread_position_in_grid.y) 50: 20c086020000 jmp_exec_none 0x2D6 56: b1808280004a20000100 image_write_block r0l, r1_r2, r0l, 1, ts0, tex_2d, rtz, 9, i32, 0 60: b1808290004a20000100 image_write_block r0l, r1_r2, r8l, 1, ts0, tex_2d, rtz, 9, i32, 0 6a: b18082a0004a20000100 image_write_block r0l, r1_r2, r16l, 1, ts0, tex_2d, rtz, 9, i32, 0 74: b18082b0004a20000100 image_write_block r0l, r1_r2, r24l, 1, ts0, tex_2d, rtz, 9, i32, 0 7e: b1808280004a20100100 image_write_block r0l, r1_r2, r32l, 1, ts0, tex_2d, rtz, 9, i32, 0 88: b1808290004a20100100 image_write_block r0l, r1_r2, r40l, 1, ts0, tex_2d, rtz, 9, i32, 0 92: b18082a0004a20100100 image_write_block r0l, r1_r2, r48l, 1, ts0, tex_2d, rtz, 9, i32, 0 9c: b18082b0004a20100100 image_write_block r0l, r1_r2, r56l, 1, ts0, tex_2d, rtz, 9, i32, 0 a6: b1808280004a20200100 image_write_block r0l, r1_r2, r64l, 1, ts0, tex_2d, rtz, 9, i32, 0 b0: b1808290004a20200100 image_write_block r0l, r1_r2, r72l, 1, ts0, tex_2d, rtz, 9, i32, 0 ba: b18082a0004a20200100 image_write_block r0l, r1_r2, r80l, 1, ts0, tex_2d, rtz, 9, i32, 0 c4: b18082b0004a20200100 image_write_block r0l, r1_r2, r88l, 1, ts0, tex_2d, rtz, 9, i32, 0 ce: b1808280004a20300100 image_write_block r0l, r1_r2, r96l, 1, ts0, tex_2d, rtz, 9, i32, 0 d8: b1808290004a20300100 image_write_block r0l, r1_r2, r104l, 1, ts0, tex_2d, rtz, 9, i32, 0 e2: b18082a0004a20300100 image_write_block r0l, r1_r2, r112l, 1, ts0, tex_2d, rtz, 9, i32, 0 ec: b18082b0004a20300100 image_write_block r0l, r1_r2, r120l, 1, ts0, tex_2d, rtz, 9, i32, 0 f6: b18082c0004a20000100 image_write_block r0l, r1_r2, r0l, 1, ts0, tex_2d, rtz, 9, i32, 0 100: b18082d0004a20000100 image_write_block r0l, r1_r2, r8l, 1, ts0, tex_2d, rtz, 9, i32, 0 10a: b18082e0004a20000100 image_write_block r0l, r1_r2, r16l, 1, ts0, tex_2d, rtz, 9, i32, 0 114: b18082f0004a20000100 image_write_block r0l, r1_r2, r24l, 1, ts0, tex_2d, rtz, 9, i32, 0 11e: b18082c0004a20100100 image_write_block r0l, r1_r2, r32l, 1, ts0, tex_2d, rtz, 9, i32, 0 128: b18082d0004a20100100 image_write_block r0l, r1_r2, r40l, 1, ts0, tex_2d, rtz, 9, i32, 0 132: b18082e0004a20100100 image_write_block r0l, r1_r2, r48l, 1, ts0, tex_2d, rtz, 9, i32, 0 13c: b18082f0004a20100100 image_write_block r0l, r1_r2, r56l, 1, ts0, tex_2d, rtz, 9, i32, 0 146: b18082c0004a20200100 image_write_block r0l, r1_r2, r64l, 1, ts0, tex_2d, rtz, 9, i32, 0 150: b18082d0004a20200100 image_write_block r0l, r1_r2, r72l, 1, ts0, tex_2d, rtz, 9, i32, 0 15a: b18082e0004a20200100 image_write_block r0l, r1_r2, r80l, 1, ts0, tex_2d, rtz, 9, i32, 0 164: b18082f0004a20200100 image_write_block r0l, r1_r2, r88l, 1, ts0, tex_2d, rtz, 9, i32, 0 16e: b18082c0004a20300100 image_write_block r0l, r1_r2, r96l, 1, ts0, tex_2d, rtz, 9, i32, 0 178: b18082d0004a20300100 image_write_block r0l, r1_r2, r104l, 1, ts0, tex_2d, rtz, 9, i32, 0 182: b18082e0004a20300100 image_write_block r0l, r1_r2, r112l, 1, ts0, tex_2d, rtz, 9, i32, 0 18c: b18082f0004a20300100 image_write_block r0l, r1_r2, r120l, 1, ts0, tex_2d, rtz, 9, i32, 0 196: b1808280004a20000100 image_write_block r0l, r1_r2, r0l, 1, ts0, tex_2d, rtz, 9, i32, 0 1a0: b1808290004a20000100 image_write_block r0l, r1_r2, r8l, 1, ts0, tex_2d, rtz, 9, i32, 0 1aa: b18082a0004a20000100 image_write_block r0l, r1_r2, r16l, 1, ts0, tex_2d, rtz, 9, i32, 0 1b4: b18082b0004a20000100 image_write_block r0l, r1_r2, r24l, 1, ts0, tex_2d, rtz, 9, i32, 0 1be: b1808280004a20100100 image_write_block r0l, r1_r2, r32l, 1, ts0, tex_2d, rtz, 9, i32, 0 1c8: b1808290004a20100100 image_write_block r0l, r1_r2, r40l, 1, ts0, tex_2d, rtz, 9, i32, 0 1d2: b18082a0004a20100100 image_write_block r0l, r1_r2, r48l, 1, ts0, tex_2d, rtz, 9, i32, 0 1dc: b18082b0004a20100100 image_write_block r0l, r1_r2, r56l, 1, ts0, tex_2d, rtz, 9, i32, 0 1e6: b1808280004a20200100 image_write_block r0l, r1_r2, r64l, 1, ts0, tex_2d, rtz, 9, i32, 0 1f0: b1808290004a20200100 image_write_block r0l, r1_r2, r72l, 1, ts0, tex_2d, rtz, 9, i32, 0 1fa: b18082a0004a20200100 image_write_block r0l, r1_r2, r80l, 1, ts0, tex_2d, rtz, 9, i32, 0 204: b18082b0004a20200100 image_write_block r0l, r1_r2, r88l, 1, ts0, tex_2d, rtz, 9, i32, 0 20e: b1808280004a20300100 image_write_block r0l, r1_r2, r96l, 1, ts0, tex_2d, rtz, 9, i32, 0 218: b1808290004a20300100 image_write_block r0l, r1_r2, r104l, 1, ts0, tex_2d, rtz, 9, i32, 0 222: b18082a0004a20300100 image_write_block r0l, r1_r2, r112l, 1, ts0, tex_2d, rtz, 9, i32, 0 22c: b18082b0004a20300100 image_write_block r0l, r1_r2, r120l, 1, ts0, tex_2d, rtz, 9, i32, 0 236: b18082c0004a20000100 image_write_block r0l, r1_r2, r0l, 1, ts0, tex_2d, rtz, 9, i32, 0 240: b18082d0004a20000100 image_write_block r0l, r1_r2, r8l, 1, ts0, tex_2d, rtz, 9, i32, 0 24a: b18082e0004a20000100 image_write_block r0l, r1_r2, r16l, 1, ts0, tex_2d, rtz, 9, i32, 0 254: b18082f0004a20000100 image_write_block r0l, r1_r2, r24l, 1, ts0, tex_2d, rtz, 9, i32, 0 25e: b18082c0004a20100100 image_write_block r0l, r1_r2, r32l, 1, ts0, tex_2d, rtz, 9, i32, 0 268: b18082d0004a20100100 image_write_block r0l, r1_r2, r40l, 1, ts0, tex_2d, rtz, 9, i32, 0 272: b18082e0004a20100100 image_write_block r0l, r1_r2, r48l, 1, ts0, tex_2d, rtz, 9, i32, 0 27c: b18082f0004a20100100 image_write_block r0l, r1_r2, r56l, 1, ts0, tex_2d, rtz, 9, i32, 0 286: b18082c0004a20200100 image_write_block r0l, r1_r2, r64l, 1, ts0, tex_2d, rtz, 9, i32, 0 290: b18082d0004a20200100 image_write_block r0l, r1_r2, r72l, 1, ts0, tex_2d, rtz, 9, i32, 0 29a: b18082e0004a20200100 image_write_block r0l, r1_r2, r80l, 1, ts0, tex_2d, rtz, 9, i32, 0 2a4: b18082f0004a20200100 image_write_block r0l, r1_r2, r88l, 1, ts0, tex_2d, rtz, 9, i32, 0 2ae: b18082c0004a20300100 image_write_block r0l, r1_r2, r96l, 1, ts0, tex_2d, rtz, 9, i32, 0 2b8: b18082d0004a20300100 image_write_block r0l, r1_r2, r104l, 1, ts0, tex_2d, rtz, 9, i32, 0 2c2: b18082e0004a20300100 image_write_block r0l, r1_r2, r112l, 1, ts0, tex_2d, rtz, 9, i32, 0 2cc: b18082f0004a20300100 image_write_block r0l, r1_r2, r120l, 1, ts0, tex_2d, rtz, 9, i32, 0 2d6: 520e00000000 pop_exec r0l, 1 2dc: 8800 stop ```
alyssarosenzweig commented 1 year ago

They also have a lod input (not sure what it's for on a non-ms texture2d Edit I'm dumb, ms != mipmap), which seems to correspond a register defined by bits 24:29, 60:61 with bit 31 flipping lod between a 16-bit register (off) and an immediate (on).

This matches regular image_write https://patch-diff.githubusercontent.com/raw/dougallj/applegpu/pull/26.patch ... they're very closely related instructions and execute on the same hw block so it makes sense.

alyssarosenzweig commented 1 year ago

Metal's headers pass METAL_TEXTURE_WRITE_ROUNDING_MODE to their internal imageblock write functions, and indeed -ftexture-write-rounding-mode=rte flips bit 53. 0 => rte, 1 => rtz. Hijacking their builtin to pass values other than METAL_TEXTURE_WRITE_ROUNDING_MODE results in rtz for all non-1 inputs (yes, the parameter passed to the builtin is the opposite of the one in the instruction).

Also consistent with regular image write

TellowKrinkle commented 1 year ago

BTW you might want to change the class name from UnkB1InstructionDesc to something more known-sounding

alyssarosenzweig commented 1 year ago

EOT programs enabling layered rendering:

<    0: 7e0004098000         mov              r0l, u2l
<    6: b1800080004a00000900 image_write_block r0l, r0_r1, 0, ts0, tex_2d, rte, i32, 0, 9, 1
<   10: 8800                 stop             
---
>    0: 72040200             get_sr           r1l, sr2 (threadgroup_position_in_grid.z)
>    4: 7e0004098000         mov              r0l, u2l
>    a: b1800280004b00000900 image_write_block r0l, r1l_r1h_r2l_r2h_r3l, 0, ts0, tex_2d_array, rte, i32, 0, 9, 1
>   14: 8800                 stop             

we might want to override the coord desc

alyssarosenzweig commented 1 year ago

Also for reference, background programs with layered do 2D texture array reads (duh) with the layer index given by min(get_sr(2) + base_layer, 0xFFFF). The base layer must be pushed as a uniform register.

alyssarosenzweig commented 1 year ago

I've squashed in @TellowKrinkle 's patches and fixed a few more things. this should be ready to merge.

alyssarosenzweig commented 1 year ago

2D MS Array has a weirdo coordinate descriptor:

   0: 72020200             get_sr           r0h, sr2 (threadgroup_position_in_grid.z)
   4: 7e0400098000         mov              r1l, u0l
   a: 62000000             mov_imm          r0l, 0
   e: b1840080004880000a00 image_write_block r1l, r0l_r0h_r1l_r1h_r2l, 0, ts0, tex_2d_ms_array, rte, u8norm, 0, 9, 1
  18: 8800                 stop           

Based on hw experimentation the coordinate descriptor is either absent (non-array) or always a 32-bit (array). For the case of non-multisampled array, it's not clear what to do with the top 16-bits. Getting weird hw behaviour.

alyssarosenzweig commented 1 year ago

Oh this is absolutely bizarre. In the layered but not multisampled case, if the top 16-bits are 0, the test fails, but if they're anything nonzero it passes. Wat?