Open rocallahan opened 4 years ago
Hmm, so cvtps2dq
converts 4 floats, but cvtps2pd
and cvtps2pi
both convert only 2 floats. So this applies to cvtps2pi
too.
Are the v
versions wrong too? vcvtps2pd
should be Wps
and not Wvs
?
Unfortunately the binutils tests never include a memory source arg and don't test operand sizes in any case:
third_party/binutils/test_decenc/drdecode_decenc_x86.expect: 0f 5a 0d 78 56 34 12 cvtps2pd 0x12345678, %xmm1
third_party/binutils/test_decenc/drdecode_decenc_x86.expect: 0f 5a 0d 78 56 34 12 cvtps2pd 0x12345678, %xmm1
third_party/binutils/test_decenc/drdecode_decenc_x86.expect: 0f 5a c8 cvtps2pd %xmm0, %xmm1
third_party/binutils/test_decenc/drdecode_decenc_x86.expect: 0f 5a f4 cvtps2pd %xmm4, %xmm6
third_party/binutils/test_decenc/drdecode_decenc_x86.expect: 0f 5a 21 cvtps2pd (%ecx), %xmm4
third_party/binutils/test_decenc/drdecode_decenc_x86.expect: 0f 5a f4 cvtps2pd %xmm4, %xmm6
third_party/binutils/test_decenc/drdecode_decenc_x86.expect: 0f 5a 21 cvtps2pd (%ecx), %xmm4
third_party/binutils/test_decenc/drdecode_decenc_x86_64.expect: 0f 5a 0d 78 56 34 12 cvtps2pd <rel> 0x000000002235b4a6, %xmm1
third_party/binutils/test_decenc/drdecode_decenc_x86_64.expect: 0f 5a 0d 78 56 34 12 cvtps2pd <rel> 0x000000002235b70e, %xmm1
third_party/binutils/test_decenc/drdecode_decenc_x86_64.expect: 0f 5a f4 cvtps2pd %xmm4, %xmm6
third_party/binutils/test_decenc/drdecode_decenc_x86_64.expect: 0f 5a 21 cvtps2pd (%rcx), %xmm4
third_party/binutils/test_decenc/drdecode_decenc_x86_64.expect: 0f 5a f4 cvtps2pd %xmm4, %xmm6
third_party/binutils/test_decenc/drdecode_decenc_x86_64.expect: 0f 5a 21 cvtps2pd (%rcx), %xmm4
It looks like vcvtps2pd
needs new types: its source for AVX is 8_of_16 or 16 (2 or 4 floats), and for AVX-512 is 8_of_16, 16, or 32. At a quick glance I don't see existing types like that.
If there are only one or two like that we could just split the table again to avoid the complex size type.
While the 8_of_16 vs 16 for an xmm register or memory operand can't easily be tested by comparing just disassembly in Intel syntax, 16 vs 32 should show up in a test: xmm vs ymm. vcvtps2pd
with VEX.L
set is supposed to still be an xmm source, but we have it as Wvs
which would be a ymm source for VEX.L
: so do none of the tests hit that case? I see tests that have xmm and ymm: are the ymm all AVX-512?
So this applies to cvtps2pi too.
Yep, I'll fix that.
I don't really understand what's involved with adding a new complex size type or "splitting the table" :-(.
are the ymm all AVX-512?
If I understand correctly, all AVX-512 variants use EVEX, i.e. 0x62, so none of the binutils tests for vcvtps2pd
are AVX512:
c5 fc 5a e4 vcvtps2pd %ymm4, %ymm4
c5 fc 5a e4 vcvtps2pd %ymm4, %ymm4
c5 fc 5a 21 vcvtps2pd (%rcx), %ymm4
c5 f8 5a f4 vcvtps2pd %xmm4, %xmm6
c5 f8 5a 21 vcvtps2pd (%rcx), %xmm4
c5 fc 5a e4 vcvtps2pd %ymm4, %ymm4
c5 fc 5a 21 vcvtps2pd (%rcx), %ymm4
c5 fc 5a 21 vcvtps2pd (%rcx), %ymm4
c5 f8 5a f4 vcvtps2pd %xmm4, %xmm6
c5 f8 5a 21 vcvtps2pd (%rcx), %xmm4
c5 f8 5a 21 vcvtps2pd (%rcx), %xmm4
So
c5 fc 5a e4 vcvtps2pd %ymm4, %ymm4
I think corresponds to VEX.256.0F.WIG 5A /r VCVTPS2PD ymm1, xmm2/m128
, so the source operand definitely should be %xmm4
.
objdump
2.31.51 gives
1a10: c5 fc 5a e4 vcvtps2pd %xmm4,%ymm4
So I don't know why the tests are passing.
Oh, I had assumed the DR tests were actually running binutils to check the disassembly. But it looks like they don't; I guess you generated the .expect
files once using binutils and have been comparing against them ever since? Then I'd assume binutils had a bug here and it got fixed at some point.
Interesting. Thank you for the analysis. @hgreving2304 might have more insight into how the binutils expect file ended up like this.
Leaving open for fixing vcvtps2pd, including fixing the binutils tests.
There are other instructions, such as vcvtps2ph, with wrong operand sizes that need to be fixed. To avoid spamming the issue page with similar problems, I am generalizing this issue page to accommodate all of such cases.
So, looking through the list of conversion instructions at https://www.officedaytime.com/simd512e/
Scalar to scalar
Scalar to float
Float to scalar
Float to float
https://www.felixcloutier.com/x86/cvtps2pd says:
DR master has: https://github.com/DynamoRIO/dynamorio/blob/d275a9aee4e2260cda0b7ce3f5e47f6f1b668320/core/arch/x86/decode_table.c
The Intel docs are clearly correct here, I'm observing an application reading from the 8 bytes at the end of a memory mapping.