cmd/asm: confusing names, encodings for VEX instructions

rsc commented 8 years ago

I have constructed a fairly exhaustive test suite for the x86 assembler and identified some problems. The ones in this issue were added during the Go 1.6 cycle and should be corrected before the release. I intend to do this and also check in the tests.

Added in 9dd81d6, https://golang.org/cl/14127, Oct 5 2015:

MOVHDA X2, X11: have encoding "c57d6fda", want "c5796fda"
   0:   c5 7d 6f da             vmovdqa %ymm2,%ymm11
   0:   c5 79 6f da             vmovdqa %xmm2,%xmm11

MOVHDU X2, X11: have encoding "c57e6fda", want "c57a6fda"
   0:   c5 7e 6f da             vmovdqu %ymm2,%ymm11
   0:   c5 7a 6f da             vmovdqu %xmm2,%xmm11

It took me a while to figure out that VMOVDQU, VMOVDQA, and VMOVNTDQ have been renamed to MOVHDU, MOVHDA, and MOVNTHD. That's very strange. We don't use HD in place of DQ anywhere else, nor do we drop the leading V. We do use "O" in place of "DQ" in a number of places, and that would have been an OK convention to follow, but we also install aliases for the Intel names because this is all so confusing, and that wasn't done here, making the spellings chosen in this CL basically impossible to find. I only found them because I was looking for CLs related to the others below that caused test failures in instructions with more predictable names. These should be renamed, and I will do that.

Once renamed, there is still the issue, not mentioned anywhere in the CL description nor the code that the X registers (128-bit) are being reused here as identifiers for Y registers (256-bit that hold the 128-bit forms in their bottom halves). The motivation for changing VMOVDQ to MOVHD may have been to emphasize this fact, which I appreciate, but it would have been nice to discuss the overloading explicitly. I am not sure it's a great idea, since it leads to very confusing assembly. At the very least that fact needs clear documentation. It was just slipped in as far as I can tell.

The overloading of X and Y registers is handled here by changing the name but then all the followup instructions just silently used Y registers without name changes. In many cases it's hard to tell the difference, assuming you're okay with asking for an X instruction and getting one that does what you asked but also sets the high Y bits (the VEX instructions operating on X registers are documented to zero the high Y bits, not use them). But in at least one case it matters greatly: VPTEST on X registers is defined to only consider the low 128 bits in the comparison. If it silently compares the whole 256-bit Y register instead, that's sure to surprise someone (along with the fact that the top half has not been zeroed by all the earlier V instructions).

This is going to take a while to figure out, clean up, and document. Good thing there are a few days left before the release candidate.

Added in 0e23ca41, https://golang.org/cl/16434, Nov 2 2015:

VPCMPEQB X2, X9, X11: have encoding "c53574da", want "c4613174da"
   0:   c5 35 74 da             vpcmpeqb %ymm2,%ymm9,%ymm11
   0:   c4 61 31 74 da          vpcmpeqb %xmm2,%xmm9,%xmm11

VPMOVMSKB X2, R11: have encoding "c57dd7da", want "c46179d7da"
   0:   c5 7d d7 da             vpmovmskb %ymm2,%r11d
   0:   c4 61 79 d7 da          vpmovmskb %xmm2,%r11d

Added in 967564be, https://golang.org/cl/16481, Nov 5 2015:

VPAND X2, X9, X11: have encoding "c535dbda", want "c531dbda"
   0:   c5 35 db da             vpand  %ymm2,%ymm9,%ymm11
   0:   c5 31 db da             vpand  %xmm2,%xmm9,%xmm11

Added in 321a4072, https://golang.org/cl/16484, Nov 6 2015:

VPBROADCASTB X2, X11: have encoding "c4627d78da", want "c4627978da"
   0:   c4 62 7d 78 da          vpbroadcastb %xmm2,%ymm11
   0:   c4 62 79 78 da          vpbroadcastb %xmm2,%xmm11

VPTEST X2, X11: have encoding "c4627d17da", want "c4627917da"
   0:   c4 62 7d 17 da          vptest %ymm2,%ymm11
   0:   c4 62 79 17 da          vptest %xmm2,%xmm11

Added in b597e1ed, https://golang.org/cl/16773, Nov 24 2015:

VPXOR X2, X9, X11: have encoding "c535efda", want "c531efda"
   0:   c5 35 ef da             vpxor  %ymm2,%ymm9,%ymm11
   0:   c5 31 ef da             vpxor  %xmm2,%xmm9,%xmm11

Added in 1d1f2fb4, https://golang.org/cl/18593, Jan 13 2016:

PEXTRQ  $7, X11, (BX): have encoding "6644480f3a161b07", want "664c0f3a161b07"
   0:   66 44                   data16 rex.R
   2:   48 0f 3a 16             rex.W (bad) 
   6:   1b 07                   sbb    (%rdi),%eax
   0:   66 4c 0f 3a 16 1b 07    pextrq $0x7,%xmm11,(%rbx)

   XDIS 0: SSE       SSE4       6644480F3A161B07         pextrq qword ptr [rbx], xmm3, 0x7
   XDIS 0: SSE       SSE4       664C0F3A161B07           pextrq qword ptr [rbx], xmm11, 0x7

The problem here, confusing both objdump and xed, is that there are two REX bytes! The 44 48 should be 4C. I'm amazed this executes. Maybe it doesn't.

TocarIP commented 8 years ago

When I submitted original CL (https://golang.org/cl/16434), Convention was to specify argument size not in register, but in an instruction itself: MOV[B,W,L,Q] BX,CX instead of mov rbx/ecx/cx/cl So I wanted to follow the same convention with xmm/ymm As far as I understand O in MOVOU stands for Octa (8,), so HD was supposed to mean HexaDeca (16). I was hoping that this would be discussed during review, but it went in, surprisingly, without much discussion. I didn't think that we need to support 128-bit version (unlike with MOV*), so I've used names for 256-bit version without changes. Using 256-bit version in place of 128 is not okay for all instructions, not only VPTEST, if memory operand is used. As for distinguishing 128/256 bits, I'm not sure. In favor of .X,.Y suffixes: Gnu assembler already uses the same solution for e.g. vcvtpd2ps. (there are vcvtpd2psx and vcvtpd2psy), plus it's similar to current Go convention (register names are the same). On the other hand, adding Y registers simplifies porting from gas to go asm. As far as I can this both solutions (suffixes and new registers) are extensible enough to handle Z (512-bit) registers. Overall keeping everything closer to manuals/other asm dialects, bu introducing Y registers is IMHO a better alternative.

rsc commented 8 years ago

@TocarIP thanks for the comments. I agree about HD following the established pattern for a general move; it's really the follow-on effects on the other VEX instructions that are the problem. Even so I think I will rename the HD move too. If we want a generic 16-byte move alias I am inclined to just call them MOV16U and MOV16A. I've been meaning to make common aliases MOV1, MOV2, MOV4, MOV8 across all the systems anyway (right now there are inconsistencies, like MOVW means 4 bytes on ARM but 2 bytes on x86). But I'll leave that for Go 1.7.

In general I am leaning toward making the Go assembly match GNU assembler as much as possible in the SSE, AVX, and other extensions, for exactly the reasons you mentioned. So I'll add the Y registers now, make both forms of the VEX instructions available, and queue further cleanup for Go 1.7. (I'm going to leave the integer instructions alone, we'll keep having AX instead of AX/EAX/RAX and so on.)

Thanks again.

randall77 commented 8 years ago

I would agree that adding the Y register names seems like the right thing to do. I did not realize at the time that the V* opcodes had both xmm and ymm forms.

gopherbot commented 8 years ago

CL https://golang.org/cl/18846 mentions this issue.

gopherbot commented 8 years ago

CL https://golang.org/cl/18850 mentions this issue.

gopherbot commented 8 years ago

CL https://golang.org/cl/18849 mentions this issue.

gopherbot commented 8 years ago

CL https://golang.org/cl/18852 mentions this issue.

golang / go

cmd/asm: confusing names, encodings for VEX instructions #14068