Fix instructions, e.g., vcvtps2pd, that have wrong operand sizes

rocallahan commented 4 years ago

https://www.felixcloutier.com/x86/cvtps2pd says:

128-bit Legacy SSE version: The source operand is an XMM register or 64- bit memory location.

DR master has: https://github.com/DynamoRIO/dynamorio/blob/d275a9aee4e2260cda0b7ce3f5e47f6f1b668320/core/arch/x86/decode_table.c

#define Wps TYPE_W, OPSZ_16
...
    {OP_cvtps2pd, 0x0f5a10, "cvtps2pd", Vpd, xx, Wps, xx, xx, mrm, x, END_LIST},

The Intel docs are clearly correct here, I'm observing an application reading from the 8 bytes at the end of a memory mapping.

derekbruening commented 4 years ago

Hmm, so cvtps2dq converts 4 floats, but cvtps2pd and cvtps2pi both convert only 2 floats. So this applies to cvtps2pi too.

Are the v versions wrong too? vcvtps2pd should be Wps and not Wvs?

Unfortunately the binutils tests never include a memory source arg and don't test operand sizes in any case:

third_party/binutils/test_decenc/drdecode_decenc_x86.expect: 0f 5a 0d 78 56 34 12 cvtps2pd 0x12345678, %xmm1
third_party/binutils/test_decenc/drdecode_decenc_x86.expect: 0f 5a 0d 78 56 34 12 cvtps2pd 0x12345678, %xmm1
third_party/binutils/test_decenc/drdecode_decenc_x86.expect: 0f 5a c8             cvtps2pd %xmm0, %xmm1
third_party/binutils/test_decenc/drdecode_decenc_x86.expect: 0f 5a f4             cvtps2pd %xmm4, %xmm6
third_party/binutils/test_decenc/drdecode_decenc_x86.expect: 0f 5a 21             cvtps2pd (%ecx), %xmm4
third_party/binutils/test_decenc/drdecode_decenc_x86.expect: 0f 5a f4             cvtps2pd %xmm4, %xmm6
third_party/binutils/test_decenc/drdecode_decenc_x86.expect: 0f 5a 21             cvtps2pd (%ecx), %xmm4
third_party/binutils/test_decenc/drdecode_decenc_x86_64.expect: 0f 5a 0d 78 56 34 12 cvtps2pd <rel> 0x000000002235b4a6, %xmm1
third_party/binutils/test_decenc/drdecode_decenc_x86_64.expect: 0f 5a 0d 78 56 34 12 cvtps2pd <rel> 0x000000002235b70e, %xmm1
third_party/binutils/test_decenc/drdecode_decenc_x86_64.expect: 0f 5a f4             cvtps2pd %xmm4, %xmm6
third_party/binutils/test_decenc/drdecode_decenc_x86_64.expect: 0f 5a 21             cvtps2pd (%rcx), %xmm4
third_party/binutils/test_decenc/drdecode_decenc_x86_64.expect: 0f 5a f4             cvtps2pd %xmm4, %xmm6
third_party/binutils/test_decenc/drdecode_decenc_x86_64.expect: 0f 5a 21             cvtps2pd (%rcx), %xmm4

derekbruening commented 4 years ago

It looks like vcvtps2pd needs new types: its source for AVX is 8_of_16 or 16 (2 or 4 floats), and for AVX-512 is 8_of_16, 16, or 32. At a quick glance I don't see existing types like that.

derekbruening commented 4 years ago

If there are only one or two like that we could just split the table again to avoid the complex size type.

derekbruening commented 4 years ago

While the 8_of_16 vs 16 for an xmm register or memory operand can't easily be tested by comparing just disassembly in Intel syntax, 16 vs 32 should show up in a test: xmm vs ymm. vcvtps2pd with VEX.L set is supposed to still be an xmm source, but we have it as Wvs which would be a ymm source for VEX.L: so do none of the tests hit that case? I see tests that have xmm and ymm: are the ymm all AVX-512?

rocallahan commented 4 years ago

So this applies to cvtps2pi too.

Yep, I'll fix that.

I don't really understand what's involved with adding a new complex size type or "splitting the table" :-(.

are the ymm all AVX-512?

If I understand correctly, all AVX-512 variants use EVEX, i.e. 0x62, so none of the binutils tests for vcvtps2pd are AVX512:

 c5 fc 5a e4          vcvtps2pd %ymm4, %ymm4
 c5 fc 5a e4          vcvtps2pd %ymm4, %ymm4
 c5 fc 5a 21          vcvtps2pd (%rcx), %ymm4
 c5 f8 5a f4          vcvtps2pd %xmm4, %xmm6
 c5 f8 5a 21          vcvtps2pd (%rcx), %xmm4
 c5 fc 5a e4          vcvtps2pd %ymm4, %ymm4
 c5 fc 5a 21          vcvtps2pd (%rcx), %ymm4
 c5 fc 5a 21          vcvtps2pd (%rcx), %ymm4
 c5 f8 5a f4          vcvtps2pd %xmm4, %xmm6
 c5 f8 5a 21          vcvtps2pd (%rcx), %xmm4
 c5 f8 5a 21          vcvtps2pd (%rcx), %xmm4

rocallahan commented 4 years ago

So

 c5 fc 5a e4          vcvtps2pd %ymm4, %ymm4

I think corresponds to VEX.256.0F.WIG 5A /r VCVTPS2PD ymm1, xmm2/m128, so the source operand definitely should be %xmm4.

rocallahan commented 4 years ago

objdump 2.31.51 gives

    1a10:       c5 fc 5a e4             vcvtps2pd %xmm4,%ymm4

So I don't know why the tests are passing.

rocallahan commented 4 years ago

Oh, I had assumed the DR tests were actually running binutils to check the disassembly. But it looks like they don't; I guess you generated the .expect files once using binutils and have been comparing against them ever since? Then I'd assume binutils had a bug here and it got fixed at some point.

derekbruening commented 4 years ago

Interesting. Thank you for the analysis. @hgreving2304 might have more insight into how the binutils expect file ended up like this.

derekbruening commented 4 years ago

Leaving open for fixing vcvtps2pd, including fixing the binutils tests.

johnfxgalea commented 3 years ago

There are other instructions, such as vcvtps2ph, with wrong operand sizes that need to be fixed. To avoid spamming the issue page with similar problems, I am generalizing this issue page to accommodate all of such cases.

khuey commented 3 years ago

So, looking through the list of conversion instructions at https://www.officedaytime.com/simd512e/

Scalar to scalar

[x] vpmovqd takes Wh_e (half size operands)
[x] vpmovsqd takes Wh_e
[x] vpmovusqd takes Wh_e
[x] vpmovqw takes Wi_e (quarter size operands)
[x] vpmovsqw takes Wi_e
[x] vpmovusqw takes Wi_e
[x] vpmovqb takes Wj_e (eighth size operands)
[x] vpmovsqb takes Wj_e
[x] vpmovusqb takes Wj_e
[ ] pmovsxdq takes Wdq (should this be Wq_dq?)
[x] VEX vpmovsxdq takes Wh_x (half size operands with VEX)
[x] EVEX pmovsxdq takes Wh_e (half size operands with EVEX)
[ ] pmovzxdq takes Wdq (same as above?)
[x] VEX vpmovzxdq takes Wh_x
[x] EVEX vpmovzxdq takes Wh_e
[x] packssdw has same size operands
[x] packusdw has same size operands
[x] vpmovdw takes Wh_e
[x] vpmovsdw takes Wh_e
[x] vpmovusdw takes Wh_e
[x] vpmovdb takes Wi_e
[x] vpmovsdb takes Wi_e
[x] vpmovusdb takes Wi_e
[ ] pmovsxwq takes Wdq (should this be Wd_dq?)
[x] VEX vpmovsxwq takes Wi_x (quarter size operands with VEX)
[x] EVEX vpmovsxwq takes Wi_e (quarter size operands with EVEX)
[ ] pmovzxwq takes Wdq (should this be Wd_dq?)
[x] VEX vpmovzxwq takes Wi_x
[x] EVEX vpmovzxwq takes Wi_e
[ ] pmovsxwd takes Wdq (should this be Wq_dq?)
[x] VEX vpmovsxwd takes Wh_x
[x] EVEX vpmovsxwd takes Wh_e
[ ] pmovzxwd takes Wdq (should this be Wq_dq?)
[x] VEX vpmovzxwd takes Wh_x
[x] EVEX vpmovzxwd takes Wh_e
[x] packsswb has same size operands
[x] packuswb has same size operands
[x] vpmovwb takes Wh_e
[x] vpmovswb takes Wh_e
[x] vpmovuswb takes Wh_e
[ ] pmovsxbq takes Wdq (should this be Ww_dq?)
[x] VEX vpmovsxbq takes Wj_x (eighth size operands with VEX)
[x] EVEX vpmovsxbq takes Wj_e (eighth size operands with EVEX)
[ ] pmovzxbq takes Wdq (should this be Ww_dq?)
[x] VEX vpmovzxbq takes Wj_x (eighth size operands with VEX)
[x] EVEX vpmovzxbq takes Wj_e (eighth size operands with EVEX)
[ ] pmovsxbd takes Wdq (should this be Wd_dq?)
[x] VEX vpmovsxbd takes Wi_x (eighth size operands with VEX)
[x] EVEX vpmovsxbd takes Wi_e (eighth size operands with EVEX)
[ ] pmovzxbd takes Wdq (should this be Wd_dq?)
[x] VEX vpmovzxbd takes Wi_x (eighth size operands with VEX)
[x] EVEX vpmovzxbd takes Wi_e (eighth size operands with EVEX)
[ ] pmovsxbw takes Wdq (should this be Wq_dq?)
[x] VEX vpmovsxbw takes Wh_x (eighth size operands with VEX)
[x] EVEX vpmovsxbw takes Wh_e (eighth size operands with EVEX)
[ ] pmovzxbw takes Wdq (should this be Wq_dq?)
[x] VEX vpmovzxbw takes Wh_x (eighth size operands with VEX)
[x] EVEX vpmovzxbw takes Wh_e (eighth size operands with EVEX)

Scalar to float

[x] cvtsi2sd is Vsd (half of an XMM register)
[x] vcvtsi2sd is Vdq with an Hsd source
[x] vcvtqq2pd has same size operands
[x] vcvtuqq2pd has same size operands
[x] cvtsi2ss is Vss (a quarter of an XMM register)
[x] vcvtsi2ss is Vdq with a H12_dq source
[ ] cvtdq2pd has a Wq_dq source, but a Vpd destination (should this be Vsd?)
[ ] VEX vcvtdq2pd this one is weird, the source register is always an XMM (whether it's half or all is controlled by VEX.L). Is that really covered by Wvq_dq?
[x] EVEX vcvtdq2pd has same size operands
[x] vcvtudq2pd is We (should be Wh_e)
[x] cvtdq2ps has same size operands
[x] VEX vcvtdq2ps has same size operands
[x] EVEX vcvtdq2ps has same size operands
[x] vcvtudq2ps has same size operands

Float to scalar

[x] cvtsd2si is Wsd (4 or 8 with REX)
[x] VEX vcvtsd2si is Wsd
[x] EVEX vcvtsd2si is Wsd
[x] cvttsd2si is Wsd
[x] VEX vcvttsd2si is Wsd
[x] EVEX vcvttsd2si is Wsd
[x] vcvtpd2qq has same size operands
[x] vcvttpd2qq has same size operands
[x] vcvtpd2uqq has same size operands
[x] vcvttpd2uqq has same size operands
[x] cvtpd2dq has same size operands
[x] VEX vcvtpd2dq has same size operands
[x] EVEX vcvtpd2dq has same size operands
[x] cvttpd2dq has same size operands
[x] VEX vcvttpd2dq has same size operands
[x] EVEX vcvttpd2dq has same size operands
[x] vcvtpd2udq has same size operands
[x] vcvttpd2udq has same size operands
[x] cvtss2si is Wss (4 of XMM)
[x] VEX vcvtss2si is Wss
[x] EVEX vcvtss2si is Wss
[x] cvttss2si is Wss
[x] VEX vcvttss2si is Wss
[x] EVEX vcvttss2si is Wss
[ ] vcvtps2qq is Wes (should be Wh_e)
[ ] vcvttps2qq is Wes (should be Wh_e)
[ ] vcvtps2uqq is Wes (should be Wh_e)
[ ] vcvttps2uqq is Wes (should be Wh_e)
[x] cvtps2dq has same size operands
[x] VEX vcvtps2dq has same size operands
[x] EVEX vcvtps2dq has same size operands
[x] cvttps2dq has same size operands
[x] VEX vcvttps2dq has same size operands
[x] EVEX vcvttps2dq has same size operands
[x] vcvtps2udq has same size operands
[x] vcvttps2udq has same size operands

Float to float

[x] cvtpd2ps has same size operands
[x] VEX vcvtpd2ps has same size operands
[x] EVEX vcvtpd2ps has same size operands
[x] cvtps2pd is Wq_dq
[x] VEX vcvtps2pd is Wvs (should be half size see also vcvtdq2pd)
[x] EVEX vcvtps2pd is Wes should be Wh_e
[x] VEX vcvtps2ph is Wh_x
[x] EVEX vcvtps2ph is Wh_e
[x] VEX vcvtph2ps is Wh_x
[x] EVEX vcvtph2ps is Wh_e

DynamoRIO / dynamorio

Fix instructions, e.g., vcvtps2pd, that have wrong operand sizes #4297