Closed alyssarosenzweig closed 1 year ago
Texture extension is at 62 0001-imageblock-write-Texture-extension.patch
Looks like there's a CoordsDesc and TEX_TYPES in there too 0001-imageblock-write-coordsdesc-and-tex_type.patch (note: I haven't confirmed the flags bit on that)
Want to make sure these match up with what you're seeing in end-of-tile programs?
Metal's headers pass __METAL_TEXTURE_WRITE_ROUNDING_MODE__
to their internal imageblock write functions, and indeed -ftexture-write-rounding-mode=rte
flips bit 53. 0 => rte, 1 => rtz. Hijacking their builtin to pass values other than __METAL_TEXTURE_WRITE_ROUNDING_MODE__
results in rtz for all non-1 inputs (yes, the parameter passed to the builtin is the opposite of the one in the instruction).
They also have a lod
input (not sure what it's for on a non-ms texture2d Edit I'm dumb, ms != mipmap), which seems to correspond a register defined by bits 24:29, 60:61 with bit 31 flipping lod between a 16-bit register (off) and an immediate (on). Weirdly immediates 256-511 overflow into bit 30, but that might just be them not expecting such large values, as 512 overflows to an instruction identical to 0.
They also have a lod input (not sure what it's for on a non-ms texture2d Edit I'm dumb, ms != mipmap), which seems to correspond a register defined by bits 24:29, 60:61 with bit 31 flipping lod between a 16-bit register (off) and an immediate (on).
This matches regular image_write https://patch-diff.githubusercontent.com/raw/dougallj/applegpu/pull/26.patch ... they're very closely related instructions and execute on the same hw block so it makes sense.
Metal's headers pass METAL_TEXTURE_WRITE_ROUNDING_MODE to their internal imageblock write functions, and indeed -ftexture-write-rounding-mode=rte flips bit 53. 0 => rte, 1 => rtz. Hijacking their builtin to pass values other than METAL_TEXTURE_WRITE_ROUNDING_MODE results in rtz for all non-1 inputs (yes, the parameter passed to the builtin is the opposite of the one in the instruction).
Also consistent with regular image write
BTW you might want to change the class name from UnkB1InstructionDesc
to something more known-sounding
EOT programs enabling layered rendering:
< 0: 7e0004098000 mov r0l, u2l
< 6: b1800080004a00000900 image_write_block r0l, r0_r1, 0, ts0, tex_2d, rte, i32, 0, 9, 1
< 10: 8800 stop
---
> 0: 72040200 get_sr r1l, sr2 (threadgroup_position_in_grid.z)
> 4: 7e0004098000 mov r0l, u2l
> a: b1800280004b00000900 image_write_block r0l, r1l_r1h_r2l_r2h_r3l, 0, ts0, tex_2d_array, rte, i32, 0, 9, 1
> 14: 8800 stop
we might want to override the coord desc
Also for reference, background programs with layered do 2D texture array reads (duh) with the layer index given by min(get_sr(2) + base_layer, 0xFFFF)
. The base layer must be pushed as a uniform register.
I've squashed in @TellowKrinkle 's patches and fixed a few more things. this should be ready to merge.
2D MS Array has a weirdo coordinate descriptor:
0: 72020200 get_sr r0h, sr2 (threadgroup_position_in_grid.z)
4: 7e0400098000 mov r1l, u0l
a: 62000000 mov_imm r0l, 0
e: b1840080004880000a00 image_write_block r1l, r0l_r0h_r1l_r1h_r2l, 0, ts0, tex_2d_ms_array, rte, u8norm, 0, 9, 1
18: 8800 stop
Based on hw experimentation the coordinate descriptor is either absent (non-array) or always a 32-bit (array). For the case of non-multisampled array, it's not clear what to do with the top 16-bits. Getting weird hw behaviour.
Oh this is absolutely bizarre. In the layered but not multisampled case, if the top 16-bits are 0, the test fails, but if they're anything nonzero it passes. Wat?
This instruction ("TODO.unkB1") is used to write out an entire block from local memory into an image. Because it is block based and not pixel based, in comparison to the regular image write instruction it works even if the destination image is compressed. It is tailor fit for use in the end-of-tile program, to blit tile memory to the framebuffer.