dougallj / applegpu

Apple G13 GPU architecture docs and tools
BSD 3-Clause "New" or "Revised" License
545 stars 40 forks source link

dt field in Operand #1

Closed pure-water closed 3 years ago

pure-water commented 3 years ago

Hi, I was just wondering what is the Dt field purpose for the subscript of the operand? E.g

D = ALUDST(Dx:D,Dt)

My guess is it is just a hint showing the "endian encoding" of the operanding nothing else?

But when I got experiment, it seems otherwise

python3 assemble.py 'fmul r1, r2.neg, 0.5' encoding opcod3: b'\x1a\x85D\n\x02\x00' 1a85440a0200 fmul r1, r2.neg, 0.5

python3 assemble.py 'fmul $r1, r2.neg, 0.5' encoding opcod3: b'\x9a\x85D\n\x02\x00' 9a85440a0200 fmul $r1, r2.neg, 0.5

What is the $r1 and r1 difference as the destination register?

dougallj commented 3 years ago

Thanks for opening the issue! I'll keep this open until I clarify this in the docs and disassembler.

The Dt field (usually) contains two bits. The least-significant, in bit 7 indicates the cache hint when set (described under "Register Cache" in the docs). I show this in the assembler (as noted) as $ (it's very common in disassembly, so I was avoiding using a verbose .cache suffix in preference for the cash/cache pun). Since it's already causing confusion I think I will change the default behaviour of the tools to use a .cache suffix (which the assembler can already accept, or maybe I should go with .reuse which is used in this paper about a different GPU https://arxiv.org/abs/1903.07486 ? Lots of naming decisions.)

(I don't think it's relevant to your question, but the most-significant bit of Dt (position 8) encodes the destination size, so the 32-bit per lane r1 gets the value 1, but the 16-bit per lane r1l would get the value 0)

pure-water commented 3 years ago

Hi, Thanks for anaswering.

I actually figured the "$" use from you code after 2 hours of follownig code before I read the post. It is here:

CACHE_HINT = '$'
def try_parse_register(s):
    flags = []
    if s.startswith(CACHE_HINT):
        s = s[1:]
        flags.append(CACHE_FLAG)

Yes, I agree with the change to use ".cache" directly to be consistent with ".discard" . Therefore we have an uniform operand cache hint scheme. It will be more obvious than "$".