DynamoRIO / dynamorio

Dynamic Instrumentation Tool Platform
Other
2.61k stars 554 forks source link

Add support for Intel AVX512. #1312

Open derekbruening opened 9 years ago

derekbruening commented 9 years ago

From bruen...@google.com on November 05, 2013 13:13:46

We'll need to support these future ISA extensions

Original issue: http://code.google.com/p/dynamorio/issues/detail?id=1312

derekbruening commented 7 years ago

Some of these are available in Skylake and we need to add support for them.

Part of the group is xsave{s,c} + clflushopt opcodes and cpuid features.

Plus, should add FEATURE_XSAVEOPT and use it in save_xmm(). These features use eax=0x0d, ecx=1. We can split this into separate issues if necessary. It will of course be easiest for someone with a Skylake processor to implement all of this.

rocallahan commented 6 years ago

Plus, should add FEATURE_XSAVEOPT and use it in save_xmm().

FWIW Intel recommends against using XSAVEOPT in userspace. "the processor might determine that an execution of XSAVEOPT by one user application corresponds to an earlier execution of XRSTOR by a different application. For this reason, Intel recommends the application software not use the XSAVEOPT instruction."

derekbruening commented 5 years ago

Xref stack size discussion in #3489. At some point we may want to separate out the simd registers from the mcontext so we can avoid paying for their space on memory-constrained platforms and keep our signal stack size small.

hgreving2304 commented 5 years ago

The following AVX-512 EVEX opcodes (VEX opcodes see #3558) have to be added to decoder/encoder:

AVX-512, AVX promoted

OP_vmovss (Done) OP_vmovsd (Done) OP_vmovups (Done) OP_vmovupd (Done) OP_vmovlps (Done) OP_vmovsldup (Done) OP_vmovlpd (Done) OP_vmovddup (Done) OP_vunpcklps (Done) OP_vunpcklpd (Done) OP_vunpckhps (Done) OP_vunpckhpd (Done) OP_vmovhps (Done) OP_vmovshdup (Done) OP_vmovhpd (Done) OP_vmovaps (Done) OP_vmovapd (Done) OP_vcvtsi2ss (Done) OP_vcvtsi2sd (Done) OP_vmovntps (Done) OP_vmovntpd (Done) OP_vcvttss2si (Done) OP_vcvttsd2si (Done) OP_vcvtss2si (Done) OP_vcvtsd2si (Done) OP_vucomiss (Done) OP_vucomisd (Done) OP_vcomiss (Done) OP_vcomisd (Done) OP_vmovmskps (No evex form) OP_vmovmskpd (No evex form) OP_vsqrtps (Done) OP_vsqrtss (Done) OP_vsqrtpd (Done) OP_vsqrtsd (Done) OP_vrsqrtps (No evex form) OP_vrsqrtss (No evex form) OP_vrcpps OP_vrcpss OP_vandps (Done) OP_vandpd (Done) OP_vandnps (Done) OP_vandnpd (Done) OP_vorps (Done) OP_vorpd (Done) OP_vxorps (Done) OP_vxorpd (Done) OP_vaddps (Done) OP_vaddss (Done) OP_vaddpd (Done) OP_vaddsd (Done) OP_vmulps (Done) OP_vmulss (Done) OP_vmulpd (Done) OP_vmulsd (Done) OP_vcvtps2pd (Done) OP_vcvtss2sd (Done) OP_vcvtpd2ps (Done) OP_vcvtsd2ss (Done) OP_vcvtdq2ps (Done) OP_vcvttps2dq (Done) OP_vcvtps2dq (Done) OP_vsubps (Done) OP_vsubss (Done) OP_vsubpd (Done) OP_vsubsd (Done) OP_vminps OP_vminss OP_vminpd OP_vminsd OP_vdivps OP_vdivss OP_vdivpd OP_vdivsd OP_vmaxps OP_vmaxss OP_vmaxpd OP_vmaxsd OP_vpunpcklbw OP_vpunpcklwd OP_vpunpckldq OP_vpacksswb OP_vpcmpgtb OP_vpcmpgtw OP_vpcmpgtd OP_vpackuswb OP_vpunpckhbw OP_vpunpckhwd OP_vpunpckhdq OP_vpackssdw OP_vpunpcklqdq OP_vpunpckhqdq OP_vmovd OP_vpshufhw OP_vpshufd OP_vpshuflw OP_vpcmpeqb OP_vpcmpeqw OP_vpcmpeqd OP_vmovq OP_vcmpps OP_vcmpss OP_vcmppd OP_vcmpsd OP_vpinsrw OP_vpextrw OP_vshufps OP_vshufpd OP_vpsrlw OP_vpsrld OP_vpsrlq OP_vpmullw (Done) OP_vpmovmskb OP_vpsubusb (Done) OP_vpsubusw (Done) OP_vpminub OP_vpandd (Done) OP_vpandq (Done) OP_vpaddusb (Done) OP_vpaddusw (Done) OP_vpmaxub OP_vpandnd (Done) OP_vpandnq (Done) OP_vpavgb OP_vpsraw OP_vpsrad OP_vpavgw OP_vpmulhuw (Done) OP_vpmulhw (Done) OP_vcvtdq2pd (Done) OP_vcvttpd2dq (Done) OP_vcvtpd2dq (Done) OP_vmovntdq OP_vpsubsb (Done) OP_vpsubsw (Done) OP_vpminsw OP_vpord (Done) OP_vporq (Done) OP_vpaddsb (Done) OP_vpaddsw (Done) OP_vpmaxsw OP_vpxord (Done) OP_vpxorq (Done) OP_vpsllw OP_vpslld OP_vpsllq OP_vpmuludq (Done) OP_vpmaddwd (Done) OP_vpsadbw OP_vmaskmovdqu OP_vpsubb (Done) OP_vpsubw (Done) OP_vpsubd (Done) OP_vpsubq (Done) OP_vpaddb (Done) OP_vpaddw (Done) OP_vpaddd (Done) OP_vpaddq (Done) OP_vpsrldq OP_vpslldq OP_vhaddpd (No evex form) OP_vhaddps (No evex form) OP_vhsubpd (No evex form) OP_vhsubps (No evex form) OP_vaddsubps (No evex form) OP_vaddsubpd (No evex form) OP_vlddqu OP_vpshufb OP_vphaddw (No evex form) OP_vphaddd (No evex form) OP_vphaddsw (No evex form) OP_vpmaddubsw (Done) OP_vphsubw (No evex form) OP_vphsubd (No evex form) OP_vphsubsw (No evex form) OP_vpsignb OP_vpsignw OP_vpsignd OP_vpmulhrsw (Done) OP_vpabsb OP_vpabsw OP_vpabsd OP_vpalignr OP_vpblendvb OP_vblendvps OP_vblendvpd OP_vptest OP_vpmovsxbw OP_vpmovsxbd OP_vpmovsxbq OP_vpmovsxwd OP_vpmovsxwq OP_vpmovsxdq OP_vpmuldq (Done) OP_vpcmpeqq OP_vmovntdqa OP_vpackusdw OP_vpmovzxbw OP_vpmovzxbd OP_vpmovzxbq OP_vpmovzxwd OP_vpmovzxwq OP_vpmovzxdq OP_vpcmpgtq OP_vpminsb OP_vpminsd OP_vpminuw OP_vpminud OP_vpmaxsb OP_vpmaxsd OP_vpmaxuw OP_vpmaxud OP_vpmulld (Done) OP_vphminposuw OP_vaesimc OP_vaesenc OP_vaesenclast OP_vaesdec OP_vaesdeclast OP_vpextrb OP_vpextrd OP_vextractps OP_vroundps OP_vroundpd OP_vroundss OP_vroundsd OP_vblendps OP_vblendpd OP_vpblendw OP_vpinsrb OP_vinsertps OP_vpinsrd OP_vdpps OP_vdppd OP_vmpsadbw OP_vpcmpestrm OP_vpcmpestri OP_vpcmpistrm OP_vpcmpistri OP_vpclmulqdq (No evex form) OP_vaeskeygenassist OP_vtestps OP_vtestpd OP_vzeroupper OP_vzeroall OP_vldmxcsr OP_vstmxcsr OP_vbroadcastss OP_vbroadcastsd OP_vbroadcastf128 OP_vmaskmovps OP_vmaskmovpd OP_vpermilps OP_vpermilpd OP_vperm2f128 OP_vinsertf128 OP_vextractf128 OP_vcvtph2ps (Done) OP_vcvtps2ph (Done) OP_vfmadd132ps (WIP) OP_vfmadd132pd (WIP) OP_vfmadd213ps (WIP) OP_vfmadd213pd (WIP) OP_vfmadd231ps (WIP) OP_vfmadd231pd (WIP) OP_vfmadd132ss (WIP) OP_vfmadd132sd (WIP) OP_vfmadd213ss (WIP) OP_vfmadd213sd (WIP) OP_vfmadd231ss (WIP) OP_vfmadd231sd (WIP) OP_vfmaddsub132ps (WIP) OP_vfmaddsub132pd (WIP) OP_vfmaddsub213ps (WIP) OP_vfmaddsub213pd (WIP) OP_vfmaddsub231ps (WIP) OP_vfmaddsub231pd (WIP) OP_vfmsubadd132ps (WIP) OP_vfmsubadd132pd (WIP) OP_vfmsubadd213ps (WIP) OP_vfmsubadd213pd (WIP) OP_vfmsubadd231ps (WIP) OP_vfmsubadd231pd (WIP) OP_vfmsub132ps (WIP) OP_vfmsub132pd (WIP) OP_vfmsub213ps (WIP) OP_vfmsub213pd (WIP) OP_vfmsub231ps (WIP) OP_vfmsub231pd (WIP) OP_vfmsub132ss (WIP) OP_vfmsub132sd (WIP) OP_vfmsub213ss (WIP) OP_vfmsub213sd (WIP) OP_vfmsub231ss (WIP) OP_vfmsub231sd (WIP) OP_vfnmadd132ps (WIP) OP_vfnmadd132pd (WIP) OP_vfnmadd213ps (WIP) OP_vfnmadd213pd (WIP) OP_vfnmadd231ps (WIP) OP_vfnmadd231pd (WIP) OP_vfnmadd132ss (WIP) OP_vfnmadd132sd (WIP) OP_vfnmadd213ss (WIP) OP_vfnmadd213sd (WIP) OP_vfnmadd231ss (WIP) OP_vfnmadd231sd (WIP) OP_vfnmsub132ps (WIP) OP_vfnmsub132pd (WIP) OP_vfnmsub213ps (WIP) OP_vfnmsub213pd (WIP) OP_vfnmsub231ps (WIP) OP_vfnmsub231pd (WIP) OP_vfnmsub132ss (WIP) OP_vfnmsub132sd (WIP) OP_vfnmsub213ss (WIP) OP_vfnmsub213sd (WIP) OP_vfnmsub231ss (WIP) OP_vfnmsub231sd (WIP)

AVX-512, AVX2 promoted

OP_vpgatherdd OP_vpgatherdq OP_vpgatherqd OP_vpgatherqq OP_vgatherdps OP_vgatherdpd OP_vgatherqps OP_vgatherqpd OP_vbroadcasti128 OP_vinserti128 OP_vextracti128 OP_vpmaskmovd OP_vpmaskmovq OP_vperm2i128 OP_vpermd OP_vpermps OP_vpermq OP_vpermpd OP_vpblendd OP_vpsllvd OP_vpsllvq OP_vpsravd OP_vpsrlvd OP_vpsrlvq OP_vpbroadcastb OP_vpbroadcastw OP_vpbroadcastd OP_vpbroadcastq

AVX-512, new opcodes

OP_valignd OP_valignq OP_vblendmpd OP_vblendmps OP_vcompresspd OP_vcompressps OP_vcvtpd2udq (Done) OP_vcvttpd2udq (Done) OP_vcvtps2udq (Done) OP_vcvttps2udq (Done) OP_vcvtqq2pd (Done) OP_vcvtqq2ps (Done) OP_vcvtsd2usi (Done) OP_vcvttsd2usi (Done) OP_vcvtss2usi (Done) OP_vcvttss2usi (Done) OP_vcvtudq2pd (Done) OP_vcvtudq2ps (Done) OP_vcvtusi2usd (Done) OP_vcvtusi2uss (Done) OP_vexpandpd OP_vexpandps OP_vextractf32x4 OP_vextractf64x4 OP_vextracti32x4 OP_vextracti64x4 OP_vfixupimmpd OP_vfixupimmps OP_vfixupimmsd OP_vfixupimmss OP_vgetexppd OP_vgetexpps OP_vgetexpsd OP_vgetexpss OP_vgetmantpd OP_vgetmantps OP_vgetmantsd OP_vgetmantss OP_vinsertf32x4 OP_vinsertf64x4 OP_vmovdqa32 (Done) OP_vmovdqa64 (Done) OP_vmovdqu8 (Done) OP_vmovdqu16 (Done) OP_vmovdqu32 (Done) OP_vmovdqu64 (Done) OP_vpblendmd OP_vpblendmq OP_vpbroadcastd OP_vpbroadcastq OP_vpcmpd OP_vpcmud OP_vpcmpq OP_vpcmuq OP_vpcompressq OP_vpcompressd OP_vpermi2d OP_vpermi2q OP_vpermi2pd OP_vpermi2ps OP_vpermt2d OP_vpermt2q OP_vpermt2pd OP_vpermt2ps OP_vpexpandd OP_vpexpandq OP_vpmaxsq OP_vpmaxud OP_vpmaxuq OP_vpminsq OP_vpminud OP_vpminuq OP_vpmovsqb OP_vpmovusqb OP_vpmovsqw OP_vpmovusqw OP_vpmovsqd OP_vpmovusqd OP_vpmovsdb OP_vpmovusdb OP_vpmovsdw OP_vpmovusdw OP_vprold OP_vprolq OP_vprolvd OP_vprolvq OP_vprord OP_vprorq OP_vprorrd OP_vprorrq OP_vpscatterdd OP_vpscatterdq OP_vpscatterqd OP_vpscatterqq OP_vpsraq OP_vpsravq OP_vptestnmd OP_vptestnmq OP_vpterlogd OP_vpterlogq OP_vptestmd OP_vptestmq OP_vrcp14pd OP_vrcp14ps OP_vrcp14sd OP_vrcp14ss OP_vrndscalepd OP_vrndscaleps OP_vrndscalesd OP_vrndscaless OP_vrsqrt14pd OP_vrsqrt14ps OP_vrsqrt14sd OP_vrsqrt14ss OP_vscalepd OP_vscaleps OP_vscalesd OP_vscaless OP_vscatterdd OP_vscatterdq OP_vscatterqd OP_vscatterqq OP_vshuff32x4 OP_vshuff64x2 OP_vshufi32x4 OP_vshufi64x2

AVX-512, new opcodes AVX-512DQ

OP_vcvttpd2qq (Done) OP_vcvtpd2qq (Done) OP_vcvttpd2uqq (Done) OP_vcvtpd2uqq (Done) OP_vcvttps2qq (Done) OP_vcvtps2qq (Done) OP_vcvttps2uqq (Done) OP_vcvtps2uqq (Done) OP_vcvtuqq2pd (Done) OP_vcvtuqq2ps (Done) OP_vextractf64x2 OP_vextracti64x2 OP_vfpclasspd OP_vfpclassps OP_vfpclasssd OP_vfpclassss OP_vinsertf64x2 OP_vinserti64x2 OP_vpmovm2d OP_vpmovm2q OP_vpmovb2d OP_vpmovq2m OP_vpmullq (Done) OP_vrangepd OP_vrangeps OP_vrangesd OP_vrangess OP_vreducepd OP_vreduceps OP_vreducesd OP_vreducess

AVX-512, new opcodes AVX-512BW

OP_vdbpsadbw OP_vmovdqu8 OP_vmovdq16 OP_vpblendmb OP_vpblendmw OP_vpbroadcastb OP_vpbroadcastw OP_vpcmpb OP_vpcmub OP_vpcmpw OP_vpcmuw OP_vpermw OP_vpermi2b OP_vpermi2w OP_vpmovm2b OP_vpmovm2w OP_vpmovb2m OP_vpmovw2m OP_vpmovswb OP_vpmovuswb OP_vpsllvw OP_vpsravw OP_vpsrlvw OP_vptestnmb OP_vptestnmw OP_vptestmb OP_vptestmw

AVX-512, new opcodes AVX-512CD

OP_vpbroadcastm OP_vpconflictd OP_vpconflictq OP_vplzcntd OP_vplzcntq

AVX-512, new opcodes AVX-512ER

OP_vexp2pd OP_vexp2ps OP_vexp2sd OP_vexp2ss OP_vrcp28pd OP_vrcp28ps OP_vrcp28sd OP_vrcp28ss OP_vrsqrt28pd OP_vrsqrt28ps OP_vrsqrt28sd OP_vrsqrt28ss

AVX-512, new opcodes AVX-512PF

OP_vgatherpf0dpd OP_vgatherpf0dps OP_vgatherpf0qpd OP_vgatherpf0qps OP_vgatherpf1dpd OP_vgatherpf1dps OP_vgatherpf1qpd OP_vgatherpf1qps OP_vscatterpf0dpd OP_vscatterpf0dps OP_vscatterpf0qpd OP_vscatterpf0qps OP_vscatterpf1dpd OP_vscatterpf1dps OP_vscatterpf1qpd OP_scatterpf1qps

derekbruening commented 5 years ago

Pasting in notes that never made it here:

*** TODO add AMD ASF Advanced Synchronization Facility

http://en.wikipedia.org/wiki/Advanced_Synchronization_Facility still in proposal stage as of Oct 2013

ASF provides the capability to start, end and abort transactional execution and to mark cache lines for protected memory access in transactional code regions. It contains four new instructions—SPECULATE, COMMIT, ABORT and RELEASE—and turns the otherwise invalid LOCK-prefixed MOVx, PREFETCH and PREFETCHW instructions into valid ones inside transactional code regions. Up to 256 levels of nested transactional code regions is supported.

*** TODO #1965: x86 decoder: BNDMOV decoded as NOP

*** TODO add xsave{s,c} + clflushopt opcodes and cpuid features

Plus add FEATURE_XSAVEOPT and use it in save_xmm(). These features use eax=0x0d, ecx=1.

** TODO update drmemtrace type_is_prefetch() (and add new TRACETYPE??) for avx512

Have DR provide instr_is_prefetch()?

VGATHERPF1DPD: vectorized prefetch Others

derekbruening commented 5 years ago

Although a number of opcodes are being added, they are still all incomplete due to things like the scaled displacement of mask registers:

DR says:

0x000055a3261f4f0e  62 e2 f5 47 40 41 37 vpmullq {%k7} %zmm17 0x37(%rcx)[64byte] -> %zmm16

Other decoders say:

$ echo 0x62 0xe2 0xf5 0x47 0x40 0x41 0x37 0x90 0x90 | /usr/bin/llvm-mc -arch x86-64 --disassemble 
    .text
    vpmullq 3520(%rcx), %zmm17, %zmm16 {%k7}
    nop
    nop

$ echo 0x62 0xe2 0xf5 0x47 0x40 0x41 0x37 0x90 0x90 | /extsw/pkgs/disasm/capstone/build/capstone -x86 -
+0x0000  62e2f547404137   vpmullq   zmm16 {k7}, zmm1, zmmword ptr [ecx + 0xdc0]
+0x0007  90   nop   
+0x0008  90   nop   

I want to put an explicit note here, because I would expect these opcodes that are being added to be completely finished with each PR adding them.

JivanH commented 5 years ago

OP_vminps (Done) OP_vminss (Done) OP_vminpd (Done) OP_vminsd (Done) OP_vdivps (Done) OP_vdivss (Done) OP_vdivpd (Done) OP_vdivsd (Done) OP_vmaxps (Done) OP_vmaxss (Done) OP_vmaxpd (Done) OP_vmaxsd (Done) OP_vmovntdq (Done) OP_vmovntdqa (Done)

hgreving2304 commented 5 years ago

Ok, let's use this link: https://docs.google.com/document/d/1gDOL3mCoUCQlCyIg2iuVzxTr5ByVihhg98qD5AuXRVg/edit?usp=sharing

derekbruening commented 5 years ago

Documenting some discussion on compressed displacements and tuple types: we're treating this as an encoding detail that the IR abstracts away. The IR presents just a displacement value. Whether it's encoded with disp8 plus a special scaling factor or with disp32 shouldn't matter to most users. There's always force_full_disp for users who want a certain length.

The only thing a client would want to know is which displacements are impossible to encode: but since this applies only to disp8, that does not come into play. It's not like fixed-width ARM where they have special immediate expansion patterns you can make, so you can have certain large immediates, but most large values are impossible.

hgreving2304 commented 4 years ago

Missing support in Windows:

Some places in core/win32/ntdll.c and inject.c are marked with XXX i#1312. Those should probably be FIXME or TODO. Furthermore, there is code in Windows that addresses that xmm6-xmm15 are callee-saved, which I think we might just ignore and context switch everything anyway, but this needs some thought and probably some code changes.

Missing support in MacOS:

Untested.

Missing support in 32-bit UNIX:

Locating and copying the state to/from signal frames to/from the xsave area needs support.

johnfxgalea commented 3 years ago

DR's ir does not consider implicit operands of vzeroupper and vzeroall. This needs to be fixed, if possible.