DynamoRIO / dynamorio

Dynamic Instrumentation Tool Platform
Other
2.62k stars 556 forks source link

Fix instructions, e.g., vcvtps2pd, that have wrong operand sizes #4297

Open rocallahan opened 4 years ago

rocallahan commented 4 years ago

https://www.felixcloutier.com/x86/cvtps2pd says:

128-bit Legacy SSE version: The source operand is an XMM register or 64- bit memory location.

DR master has: https://github.com/DynamoRIO/dynamorio/blob/d275a9aee4e2260cda0b7ce3f5e47f6f1b668320/core/arch/x86/decode_table.c

#define Wps TYPE_W, OPSZ_16
...
    {OP_cvtps2pd, 0x0f5a10, "cvtps2pd", Vpd, xx, Wps, xx, xx, mrm, x, END_LIST},

The Intel docs are clearly correct here, I'm observing an application reading from the 8 bytes at the end of a memory mapping.

derekbruening commented 4 years ago

Hmm, so cvtps2dq converts 4 floats, but cvtps2pd and cvtps2pi both convert only 2 floats. So this applies to cvtps2pi too.

Are the v versions wrong too? vcvtps2pd should be Wps and not Wvs?

Unfortunately the binutils tests never include a memory source arg and don't test operand sizes in any case:

third_party/binutils/test_decenc/drdecode_decenc_x86.expect: 0f 5a 0d 78 56 34 12 cvtps2pd 0x12345678, %xmm1
third_party/binutils/test_decenc/drdecode_decenc_x86.expect: 0f 5a 0d 78 56 34 12 cvtps2pd 0x12345678, %xmm1
third_party/binutils/test_decenc/drdecode_decenc_x86.expect: 0f 5a c8             cvtps2pd %xmm0, %xmm1
third_party/binutils/test_decenc/drdecode_decenc_x86.expect: 0f 5a f4             cvtps2pd %xmm4, %xmm6
third_party/binutils/test_decenc/drdecode_decenc_x86.expect: 0f 5a 21             cvtps2pd (%ecx), %xmm4
third_party/binutils/test_decenc/drdecode_decenc_x86.expect: 0f 5a f4             cvtps2pd %xmm4, %xmm6
third_party/binutils/test_decenc/drdecode_decenc_x86.expect: 0f 5a 21             cvtps2pd (%ecx), %xmm4
third_party/binutils/test_decenc/drdecode_decenc_x86_64.expect: 0f 5a 0d 78 56 34 12 cvtps2pd <rel> 0x000000002235b4a6, %xmm1
third_party/binutils/test_decenc/drdecode_decenc_x86_64.expect: 0f 5a 0d 78 56 34 12 cvtps2pd <rel> 0x000000002235b70e, %xmm1
third_party/binutils/test_decenc/drdecode_decenc_x86_64.expect: 0f 5a f4             cvtps2pd %xmm4, %xmm6
third_party/binutils/test_decenc/drdecode_decenc_x86_64.expect: 0f 5a 21             cvtps2pd (%rcx), %xmm4
third_party/binutils/test_decenc/drdecode_decenc_x86_64.expect: 0f 5a f4             cvtps2pd %xmm4, %xmm6
third_party/binutils/test_decenc/drdecode_decenc_x86_64.expect: 0f 5a 21             cvtps2pd (%rcx), %xmm4
derekbruening commented 4 years ago

It looks like vcvtps2pd needs new types: its source for AVX is 8_of_16 or 16 (2 or 4 floats), and for AVX-512 is 8_of_16, 16, or 32. At a quick glance I don't see existing types like that.

derekbruening commented 4 years ago

If there are only one or two like that we could just split the table again to avoid the complex size type.

derekbruening commented 4 years ago

While the 8_of_16 vs 16 for an xmm register or memory operand can't easily be tested by comparing just disassembly in Intel syntax, 16 vs 32 should show up in a test: xmm vs ymm. vcvtps2pd with VEX.L set is supposed to still be an xmm source, but we have it as Wvs which would be a ymm source for VEX.L: so do none of the tests hit that case? I see tests that have xmm and ymm: are the ymm all AVX-512?

rocallahan commented 4 years ago

So this applies to cvtps2pi too.

Yep, I'll fix that.

I don't really understand what's involved with adding a new complex size type or "splitting the table" :-(.

are the ymm all AVX-512?

If I understand correctly, all AVX-512 variants use EVEX, i.e. 0x62, so none of the binutils tests for vcvtps2pd are AVX512:

 c5 fc 5a e4          vcvtps2pd %ymm4, %ymm4
 c5 fc 5a e4          vcvtps2pd %ymm4, %ymm4
 c5 fc 5a 21          vcvtps2pd (%rcx), %ymm4
 c5 f8 5a f4          vcvtps2pd %xmm4, %xmm6
 c5 f8 5a 21          vcvtps2pd (%rcx), %xmm4
 c5 fc 5a e4          vcvtps2pd %ymm4, %ymm4
 c5 fc 5a 21          vcvtps2pd (%rcx), %ymm4
 c5 fc 5a 21          vcvtps2pd (%rcx), %ymm4
 c5 f8 5a f4          vcvtps2pd %xmm4, %xmm6
 c5 f8 5a 21          vcvtps2pd (%rcx), %xmm4
 c5 f8 5a 21          vcvtps2pd (%rcx), %xmm4
rocallahan commented 4 years ago

So

 c5 fc 5a e4          vcvtps2pd %ymm4, %ymm4

I think corresponds to VEX.256.0F.WIG 5A /r VCVTPS2PD ymm1, xmm2/m128, so the source operand definitely should be %xmm4.

rocallahan commented 4 years ago

objdump 2.31.51 gives

    1a10:       c5 fc 5a e4             vcvtps2pd %xmm4,%ymm4

So I don't know why the tests are passing.

rocallahan commented 4 years ago

Oh, I had assumed the DR tests were actually running binutils to check the disassembly. But it looks like they don't; I guess you generated the .expect files once using binutils and have been comparing against them ever since? Then I'd assume binutils had a bug here and it got fixed at some point.

derekbruening commented 4 years ago

Interesting. Thank you for the analysis. @hgreving2304 might have more insight into how the binutils expect file ended up like this.

derekbruening commented 4 years ago

Leaving open for fixing vcvtps2pd, including fixing the binutils tests.

johnfxgalea commented 3 years ago

There are other instructions, such as vcvtps2ph, with wrong operand sizes that need to be fixed. To avoid spamming the issue page with similar problems, I am generalizing this issue page to accommodate all of such cases.

khuey commented 3 years ago

So, looking through the list of conversion instructions at https://www.officedaytime.com/simd512e/

Scalar to scalar

Scalar to float

Float to scalar

Float to float