Closed quasilyte closed 7 years ago
I want to emphasize: it does not break any existing assembly code; it only reduces immediate argument encoding restrictions (two’s complement/negative + positive instead of just one of them).
Change https://golang.org/cl/57570 mentions this issue: cmd/internal/obj/x86: new operand class for x86 asm backend
I tried pretty hard in the instruction tables to distinguish the ones that interpret the argument as signed vs unsigned. If one is wrong I'd like to fix that. Do you have an example of one that's wrong?
If VPBLENDD only cares about bits (and 0x80 is a bit and not the sign), then it should be defined to take uimm8.
Your comment:
Well, src/crypto/sha512/sha512block_amd64.s:524 does exactly this: VPBLENDD $0xF0, Y2, Y4, Y4
makes me think that VPBLENDD is already correct, or else it wouldn't assemble.
@rsc, VPBLENDD
is a bit special due to current amd64 asm handling of 4 operand instructions.
Each 4 operand instruction does not check against any of the particular immediate oclass (Yi8/Yu8,...).
When instructions are generated from, for example, x86.csv
(in x86spec format),
we need to make a decision about Yi8
vs Yu8
.
From the spec entry, we know that operand have a kind of imm8
which can map to Yi8
or Yu8
.
It is easier to mark all such arguments with proposed Yimmb
which permits both Yi8 and Yu8.
As a side effect of this, the result behavior is also more compatible with widely used assemblers.
To make it clear, current 4 operand instructions semantics for immediate argument are the same as Yimmb
oclass, its just less explicit and feels more like a bug than a feature.
I don't want to be sloppy here. Assembly is hard enough to get right. I want to be precise. Either it's a signed value or an unsigned value, and it's OK for the assembler to insist that you know which one.
I understand that other assemblers take a different approach, and that's fine, but it's not our approach. Our approach is to be picky, to try to help avoid bugs.
@rsc, I see.
Maybe we can make x86spec format express signed/unsigned immediate arguments?
If x86.csv
will have this information, code generation based on it can be simplified.
I tried pretty hard in the instruction tables to distinguish the ones that interpret the argument as signed vs unsigned. If one is wrong I'd like to fix that. Do you have an example of one that's wrong?
@rsc, I may be wrong, but it seems like yshl
should have Yu8
oclass instead of Yi32
.
https://github.com/golang/go/blob/master/src/cmd/internal/obj/x86/asm6.go#L401
var yshl = []ytab{
{Yi1, Ynone, Yml, Zo_m, 2},
{Yi32, Ynone, Yml, Zibo_m, 2}, // <- here
{Ycl, Ynone, Yml, Zo_m, 2},
{Ycx, Ynone, Yml, Zo_m, 2},
}
If Yi32
is the right thing there,
the amount of my confusion will be enormous. :)
I can check other instructions one-by-one later.
x86.csv does have this information: it says imm8 for signed or imm8u for unsigned. (Sorry for the late reply.)
The yshl
table can be corrected to Yu8, I believe. That code is much older than Go and is in some places sloppier than I would prefer.
Change https://golang.org/cl/62190 mentions this issue: cmd/asm: restrict x86 shift ops to 8bit args
Going to close this. All immediate operands will remain explicitly signed/unsigned.
Many amd64 instructions take immediate operand without auto sign extension. When we only care about binary representation, both "$128" and "$-128" have the same value.
In Intel manual,
VPBLENDD
haveimm8
operand class. This means "signed 8bit value". But semantics only care about bits, so it totally makes sense to use0xF0
as an argument. Well, src/crypto/sha512/sha512block_amd64.s:524 does exactly this:VPBLENDD $0xF0, Y2, Y4, Y4
GAS
also happily accepts both positive and negative immediate values.I propose backwards-compatible change that includes a new opclass -
Yimmb
which is a union of {Yi0, Yi1, Yi8, Yu8, Yu7} opclasses viaycover
. From a user point of view, this change will make it possible to express immediate argument in both signed and unsigned form. This first step helps us to reduceyvex_xyi3
to 2 ytab entries.The other step that may be proposed later: allowing more instructions to accept
Yimmb
opclass instead ofYi8
/Yu8
.pros:
yvex_xyi3
)Even if
asm6.go
is going to be automatically generated somewhere in future, the exact dates are unknown, so this simple change may make sense for the current.Implementation:
Possible patch is provided below.
Yimmb.patch
``` diff --git a/src/cmd/asm/internal/asm/testdata/amd64enc.s b/src/cmd/asm/internal/asm/testdata/amd64enc.s index ec888bc..02ccebb 100644 --- a/src/cmd/asm/internal/asm/testdata/amd64enc.s +++ b/src/cmd/asm/internal/asm/testdata/amd64enc.s @@ -10681,4 +10681,17 @@ TEXT asmtest(SB),DUPOK|NOSPLIT,$0 //TODO: XSAVES64 (R11) // 490fc72b //TODO: XSETBV // 0f01d1 XTEST // 0f01d6 + // Note: code below is written by hand, + // so it seems like a good idea to move + // them into separate file. + // Maybe test runner should permit include + // directives (right now it will not + // work as intended). + // + // * Check that immb opclass works as expected. + VPSHUFD $-7, (R11), Y15 // c4417d703bf9 + VPSHUFD $-128, (R11), Y15 // c4417d703b80 + VPSHUFD $128, (R11), Y15 // c4417d703b80 + VPSHUFD $255, (R11), Y15 // c4417d703bff + VPSHUFD $0, (R11), Y1 // c4c17d700b00 RET diff --git a/src/cmd/internal/obj/x86/asm6.go b/src/cmd/internal/obj/x86/asm6.go index bcf9318..33e3d8a 100644 --- a/src/cmd/internal/obj/x86/asm6.go +++ b/src/cmd/internal/obj/x86/asm6.go @@ -92,11 +92,12 @@ type Movtab struct { const ( Yxxx = iota Ynone - Yi0 // $0 - Yi1 // $1 - Yi8 // $x, x fits in int8 - Yu8 // $x, x fits in uint8 - Yu7 // $x, x in 0..127 (fits in both int8 and uint8) + Yi0 // $0 + Yi1 // $1 + Yi8 // $x, x fits in int8 + Yu8 // $x, x fits in uint8 + Yu7 // $x, x in 0..127 (fits in both int8 and uint8) + Yimmb // $x, x <- {Yi0, Yi1, Yi8, Yu8, Yu7} Ys32 Yi32 Yi64 @@ -811,10 +812,8 @@ var yvex_ri3 = []ytab{ } var yvex_xyi3 = []ytab{ - {Yu8, Yxm, Yxr, Zvex_i_rm_r, 2}, - {Yu8, Yym, Yyr, Zvex_i_rm_r, 2}, - {Yi8, Yxm, Yxr, Zvex_i_rm_r, 2}, - {Yi8, Yym, Yyr, Zvex_i_rm_r, 2}, + {Yimmb, Yxm, Yxr, Zvex_i_rm_r, 2}, + {Yimmb, Yym, Yyr, Zvex_i_rm_r, 2}, } var yvex_yyi4 = []ytab{ //TODO don't hide 4 op, some version have xmm version @@ -1672,7 +1671,7 @@ var optab = {AVPBROADCASTB, yvex_vpbroadcast, Pvex, [23]uint8{VEX_128_66_0F38_W0, 0x78, VEX_256_66_0F38_W0, 0x78}}, {AVPTEST, yvex_xy2, Pvex, [23]uint8{VEX_128_66_0F38_WIG, 0x17, VEX_256_66_0F38_WIG, 0x17}}, {AVPSHUFB, yvex_xy3, Pvex, [23]uint8{VEX_128_66_0F38_WIG, 0x00, VEX_256_66_0F38_WIG, 0x00}}, - {AVPSHUFD, yvex_xyi3, Pvex, [23]uint8{VEX_128_66_0F_WIG, 0x70, VEX_256_66_0F_WIG, 0x70, VEX_128_66_0F_WIG, 0x70, VEX_256_66_0F_WIG, 0x70}}, + {AVPSHUFD, yvex_xyi3, Pvex, [23]uint8{VEX_128_66_0F_WIG, 0x70, VEX_256_66_0F_WIG, 0x70}}, {AVPOR, yvex_xy3, Pvex, [23]uint8{VEX_128_66_0F_WIG, 0xeb, VEX_256_66_0F_WIG, 0xeb}}, {AVPADDQ, yvex_xy3, Pvex, [23]uint8{VEX_128_66_0F_WIG, 0xd4, VEX_256_66_0F_WIG, 0xd4}}, {AVPADDD, yvex_xy3, Pvex, [23]uint8{VEX_128_66_0F_WIG, 0xfe, VEX_256_66_0F_WIG, 0xfe}}, @@ -2003,17 +2002,25 @@ func instinit(ctxt *obj.Link) { ycover[Yi1*Ymax+Yu8] = 1 ycover[Yu7*Ymax+Yu8] = 1 + ycover[Yi0*Ymax+Yimmb] = 1 + ycover[Yi1*Ymax+Yimmb] = 1 + ycover[Yu7*Ymax+Yimmb] = 1 + ycover[Yu8*Ymax+Yimmb] = 1 + ycover[Yi8*Ymax+Yimmb] = 1 + ycover[Yi0*Ymax+Ys32] = 1 ycover[Yi1*Ymax+Ys32] = 1 ycover[Yu7*Ymax+Ys32] = 1 ycover[Yu8*Ymax+Ys32] = 1 ycover[Yi8*Ymax+Ys32] = 1 + ycover[Yimmb*Ymax+Ys32] = 1 ycover[Yi0*Ymax+Yi32] = 1 ycover[Yi1*Ymax+Yi32] = 1 ycover[Yu7*Ymax+Yi32] = 1 ycover[Yu8*Ymax+Yi32] = 1 ycover[Yi8*Ymax+Yi32] = 1 + ycover[Yimmb*Ymax+Yi32] = 1 ycover[Ys32*Ymax+Yi32] = 1 ycover[Yi0*Ymax+Yi64] = 1 @@ -2021,6 +2028,7 @@ func instinit(ctxt *obj.Link) { ycover[Yu7*Ymax+Yi64] = 1 ycover[Yu8*Ymax+Yi64] = 1 ycover[Yi8*Ymax+Yi64] = 1 + ycover[Yimmb*Ymax+Yi64] = 1 ycover[Ys32*Ymax+Yi64] = 1 ycover[Yi32*Ymax+Yi64] = 1 ```