proposal: cmd/asm: untyped imm8 operand class for x86 asm

quasilyte commented 7 years ago

Many amd64 instructions take immediate operand without auto sign extension. When we only care about binary representation, both "$128" and "$-128" have the same value.

In Intel manual, VPBLENDD have imm8 operand class. This means "signed 8bit value". But semantics only care about bits, so it totally makes sense to use 0xF0 as an argument. Well, src/crypto/sha512/sha512block_amd64.s:524 does exactly this: VPBLENDD $0xF0, Y2, Y4, Y4

GAS also happily accepts both positive and negative immediate values.

VPSHUFD $-128, (%R11), %YMM15
VPSHUFD $128, (%R11), %YMM15

// Note: same encoding
   6:   c4 41 7d 70 3b 80       vpshufd $0x80,(%r11),%ymm15
   c:   c4 41 7d 70 3b 80       vpshufd $0x80,(%r11),%ymm15

I propose backwards-compatible change that includes a new opclass - Yimmb which is a union of {Yi0, Yi1, Yi8, Yu8, Yu7} opclasses via ycover. From a user point of view, this change will make it possible to express immediate argument in both signed and unsigned form. This first step helps us to reduce yvex_xyi3 to 2 ytab entries.

The other step that may be proposed later: allowing more instructions to accept Yimmb opclass instead of Yi8/Yu8.

pros:

More compatibility with existing assemblers
Easier to express "imm8" operands that are not numerical values by their nature (e.g. masks)
Easier to automatically generate instructions that work without artifical restrictions
If we really need both Yi8 and Yu8, we can encode them in 1 ytab instead of duplication (see yvex_xyi3)

Even if asm6.go is going to be automatically generated somewhere in future, the exact dates are unknown, so this simple change may make sense for the current.

Implementation:

Possible patch is provided below.

Yimmb.patch

``` diff --git a/src/cmd/asm/internal/asm/testdata/amd64enc.s b/src/cmd/asm/internal/asm/testdata/amd64enc.s index ec888bc..02ccebb 100644 --- a/src/cmd/asm/internal/asm/testdata/amd64enc.s +++ b/src/cmd/asm/internal/asm/testdata/amd64enc.s @@ -10681,4 +10681,17 @@ TEXT asmtest(SB),DUPOK|NOSPLIT,$0 //TODO: XSAVES64 (R11) // 490fc72b //TODO: XSETBV // 0f01d1 XTEST // 0f01d6 + // Note: code below is written by hand, + // so it seems like a good idea to move + // them into separate file. + // Maybe test runner should permit include + // directives (right now it will not + // work as intended). + // + // * Check that immb opclass works as expected. + VPSHUFD $-7, (R11), Y15 // c4417d703bf9 + VPSHUFD $-128, (R11), Y15 // c4417d703b80 + VPSHUFD $128, (R11), Y15 // c4417d703b80 + VPSHUFD $255, (R11), Y15 // c4417d703bff + VPSHUFD $0, (R11), Y1 // c4c17d700b00 RET diff --git a/src/cmd/internal/obj/x86/asm6.go b/src/cmd/internal/obj/x86/asm6.go index bcf9318..33e3d8a 100644 --- a/src/cmd/internal/obj/x86/asm6.go +++ b/src/cmd/internal/obj/x86/asm6.go @@ -92,11 +92,12 @@ type Movtab struct { const ( Yxxx = iota Ynone - Yi0 // $0 - Yi1 // $1 - Yi8 // $x, x fits in int8 - Yu8 // $x, x fits in uint8 - Yu7 // $x, x in 0..127 (fits in both int8 and uint8) + Yi0 // $0 + Yi1 // $1 + Yi8 // $x, x fits in int8 + Yu8 // $x, x fits in uint8 + Yu7 // $x, x in 0..127 (fits in both int8 and uint8) + Yimmb // $x, x <- {Yi0, Yi1, Yi8, Yu8, Yu7} Ys32 Yi32 Yi64 @@ -811,10 +812,8 @@ var yvex_ri3 = []ytab{ } var yvex_xyi3 = []ytab{ - {Yu8, Yxm, Yxr, Zvex_i_rm_r, 2}, - {Yu8, Yym, Yyr, Zvex_i_rm_r, 2}, - {Yi8, Yxm, Yxr, Zvex_i_rm_r, 2}, - {Yi8, Yym, Yyr, Zvex_i_rm_r, 2}, + {Yimmb, Yxm, Yxr, Zvex_i_rm_r, 2}, + {Yimmb, Yym, Yyr, Zvex_i_rm_r, 2}, } var yvex_yyi4 = []ytab{ //TODO don't hide 4 op, some version have xmm version @@ -1672,7 +1671,7 @@ var optab = {AVPBROADCASTB, yvex_vpbroadcast, Pvex, [23]uint8{VEX_128_66_0F38_W0, 0x78, VEX_256_66_0F38_W0, 0x78}}, {AVPTEST, yvex_xy2, Pvex, [23]uint8{VEX_128_66_0F38_WIG, 0x17, VEX_256_66_0F38_WIG, 0x17}}, {AVPSHUFB, yvex_xy3, Pvex, [23]uint8{VEX_128_66_0F38_WIG, 0x00, VEX_256_66_0F38_WIG, 0x00}}, - {AVPSHUFD, yvex_xyi3, Pvex, [23]uint8{VEX_128_66_0F_WIG, 0x70, VEX_256_66_0F_WIG, 0x70, VEX_128_66_0F_WIG, 0x70, VEX_256_66_0F_WIG, 0x70}}, + {AVPSHUFD, yvex_xyi3, Pvex, [23]uint8{VEX_128_66_0F_WIG, 0x70, VEX_256_66_0F_WIG, 0x70}}, {AVPOR, yvex_xy3, Pvex, [23]uint8{VEX_128_66_0F_WIG, 0xeb, VEX_256_66_0F_WIG, 0xeb}}, {AVPADDQ, yvex_xy3, Pvex, [23]uint8{VEX_128_66_0F_WIG, 0xd4, VEX_256_66_0F_WIG, 0xd4}}, {AVPADDD, yvex_xy3, Pvex, [23]uint8{VEX_128_66_0F_WIG, 0xfe, VEX_256_66_0F_WIG, 0xfe}}, @@ -2003,17 +2002,25 @@ func instinit(ctxt *obj.Link) { ycover[Yi1*Ymax+Yu8] = 1 ycover[Yu7*Ymax+Yu8] = 1 + ycover[Yi0*Ymax+Yimmb] = 1 + ycover[Yi1*Ymax+Yimmb] = 1 + ycover[Yu7*Ymax+Yimmb] = 1 + ycover[Yu8*Ymax+Yimmb] = 1 + ycover[Yi8*Ymax+Yimmb] = 1 + ycover[Yi0*Ymax+Ys32] = 1 ycover[Yi1*Ymax+Ys32] = 1 ycover[Yu7*Ymax+Ys32] = 1 ycover[Yu8*Ymax+Ys32] = 1 ycover[Yi8*Ymax+Ys32] = 1 + ycover[Yimmb*Ymax+Ys32] = 1 ycover[Yi0*Ymax+Yi32] = 1 ycover[Yi1*Ymax+Yi32] = 1 ycover[Yu7*Ymax+Yi32] = 1 ycover[Yu8*Ymax+Yi32] = 1 ycover[Yi8*Ymax+Yi32] = 1 + ycover[Yimmb*Ymax+Yi32] = 1 ycover[Ys32*Ymax+Yi32] = 1 ycover[Yi0*Ymax+Yi64] = 1 @@ -2021,6 +2028,7 @@ func instinit(ctxt *obj.Link) { ycover[Yu7*Ymax+Yi64] = 1 ycover[Yu8*Ymax+Yi64] = 1 ycover[Yi8*Ymax+Yi64] = 1 + ycover[Yimmb*Ymax+Yi64] = 1 ycover[Ys32*Ymax+Yi64] = 1 ycover[Yi32*Ymax+Yi64] = 1 ```

quasilyte commented 7 years ago

I want to emphasize: it does not break any existing assembly code; it only reduces immediate argument encoding restrictions (two’s complement/negative + positive instead of just one of them).

gopherbot commented 7 years ago

Change https://golang.org/cl/57570 mentions this issue: cmd/internal/obj/x86: new operand class for x86 asm backend

rsc commented 7 years ago

I tried pretty hard in the instruction tables to distinguish the ones that interpret the argument as signed vs unsigned. If one is wrong I'd like to fix that. Do you have an example of one that's wrong?

If VPBLENDD only cares about bits (and 0x80 is a bit and not the sign), then it should be defined to take uimm8.

Your comment:

Well, src/crypto/sha512/sha512block_amd64.s:524 does exactly this: VPBLENDD $0xF0, Y2, Y4, Y4

makes me think that VPBLENDD is already correct, or else it wouldn't assemble.

quasilyte commented 7 years ago

@rsc, VPBLENDD is a bit special due to current amd64 asm handling of 4 operand instructions. Each 4 operand instruction does not check against any of the particular immediate oclass (Yi8/Yu8,...).

When instructions are generated from, for example, x86.csv (in x86spec format), we need to make a decision about Yi8 vs Yu8. From the spec entry, we know that operand have a kind of imm8 which can map to Yi8 or Yu8. It is easier to mark all such arguments with proposed Yimmb which permits both Yi8 and Yu8.

As a side effect of this, the result behavior is also more compatible with widely used assemblers.

quasilyte commented 7 years ago

To make it clear, current 4 operand instructions semantics for immediate argument are the same as Yimmb oclass, its just less explicit and feels more like a bug than a feature.

rsc commented 7 years ago

I don't want to be sloppy here. Assembly is hard enough to get right. I want to be precise. Either it's a signed value or an unsigned value, and it's OK for the assembler to insist that you know which one.

I understand that other assemblers take a different approach, and that's fine, but it's not our approach. Our approach is to be picky, to try to help avoid bugs.

quasilyte commented 7 years ago

@rsc, I see.

Maybe we can make x86spec format express signed/unsigned immediate arguments? If x86.csv will have this information, code generation based on it can be simplified.

quasilyte commented 7 years ago

I tried pretty hard in the instruction tables to distinguish the ones that interpret the argument as signed vs unsigned. If one is wrong I'd like to fix that. Do you have an example of one that's wrong?

@rsc, I may be wrong, but it seems like yshl should have Yu8 oclass instead of Yi32. https://github.com/golang/go/blob/master/src/cmd/internal/obj/x86/asm6.go#L401

var yshl = []ytab{
    {Yi1, Ynone, Yml, Zo_m, 2},
    {Yi32, Ynone, Yml, Zibo_m, 2}, // <- here
    {Ycl, Ynone, Yml, Zo_m, 2},
    {Ycx, Ynone, Yml, Zo_m, 2},
}

If Yi32 is the right thing there, the amount of my confusion will be enormous. :)

I can check other instructions one-by-one later.

rsc commented 7 years ago

x86.csv does have this information: it says imm8 for signed or imm8u for unsigned. (Sorry for the late reply.)

rsc commented 7 years ago

The yshl table can be corrected to Yu8, I believe. That code is much older than Go and is in some places sloppier than I would prefer.

gopherbot commented 7 years ago

Change https://golang.org/cl/62190 mentions this issue: cmd/asm: restrict x86 shift ops to 8bit args

quasilyte commented 7 years ago

Going to close this. All immediate operands will remain explicitly signed/unsigned.

golang / go

proposal: cmd/asm: untyped imm8 operand class for x86 asm #21528