FEX-Emu / FEX

A fast usermode x86 and x86-64 emulator for Arm64 Linux
https://fex-emu.com
MIT License
2.28k stars 121 forks source link

ppc64le support #2409

Open JeremyRand opened 1 year ago

JeremyRand commented 1 year ago

It would be cool if FEX could be ported to ppc64le, e.g. Raptor POWER9 systems. Would there be potential interest in this?

Sonicadvance1 commented 1 year ago

It's an interesting idea! Sadly my knowledge of POWER dates back to the PowerMac G3 so I don't know if its feature set is capable of handling x86 emulation.

This would likely end up being low priority (Like RISC-V support), also we would want something in CI if supported, since core emulation is fragile and easy to break.

JeremyRand commented 1 year ago

Does it support 4KB pages? If so, is this the common kernel config if it supports multiple sizes?

Both 4 KiB and 64 KiB page size are supported; different distros make different choices here, but 64 KiB is more common. It's generally not hard to build a 4 KiB kernel on a distro that packages 64 KiB; a lot of users do this due to better compatibility with poorly written GPU drivers.

Does it support 256-bit wide vectors for AVX or is it some optional thing?

I believe the max vector width is 128-bit, but I'm not 100% sure on that.

also we would want something in CI if supported, since core emulation is fragile and easy to break.

I can't guarantee anything, but I suspect the Talos community would be able to donate access to a ppc64le VM for this purpose.

Unfortunately I don't have answers to the rest of your questions, but I'm pinging some people on IRC who may be able to chime in with better answers. I really appreciate the detailed, well-thought-out questions you posed.

awilfox commented 1 year ago

Most of my answers come from the ISA reference which can be found on the OpenPOWER Foundation site. I'm quoting from 3.0 here, which is what Power9 implements - the processor in my personal Talos II. Also note that I am not representing IBM while making this comment.

  • Does it support 4KB pages? If so, is this the common kernel config if it supports multiple sizes?

It depends on the target environment. Server distros (RHEL, Alpine) tend towards 64K, while desktop distros (Void, Adélie, Chimera) tend towards 4K. Debian ports had both varieties some years ago, but I am not sure what they have now.

  • Does it support 128-bit Compare-exchange?

Power9 (ISA 3.0) can do 128-bit atomic stores, but only with 64-bit pair values, so that's probably not what you are after. (FC=11000, "Store Twin", ISA reference pp 861)

  • Does it support unaligned atomics?

Somewhat - atomic locations must be contained to an aligned 32-byte block, but the 8 or 16 bytes may appear anywhere in the block. They just cannot cross into another block. (ISA reference pp 860, 862)

  • Does it support atomic operations directly on memory locations? This is quite an improvement on ARMv8.1

Yes, Atomic Memory Operations are so named because they operate on memory (but it must not be cache-inhibited; ISA reference pp 857). You can see the GCC intrinsics for inspiration.

  • Does its latest vector extensions support everything necessary for SSE2/3/4 like ARMv8?

Not really. The compiler team at IBM added some porting aids for "original" SSE via an xmmintrin.h for ppc64el, if that is helpful at all.

  • I'm only aware of Altivec and paired singles featuresets, so no idea what the latest offers.

AltiVec (aka VMX) is available, but the newer vector extensions are called VSX. They add more instructions and registers but do not increase the width of the registers (still 128 bits).

  • Does it support 256-bit wide vectors for AVX or is it some optional thing?

No.

  • Does it handle PCIe GPU memory accesses correctly?
    • I think this typically means treating device memory as normal memory? I know a lot of platforms mess this up.

I would need further clarification to answer this. I'm able to use Radeon drivers on both big and little endian Power9 systems, so I would say the platform is capable of handling PCIe GPU memory accesses 😄

  • Does this platform support 128-bit float in hardware? Might be interesting for x87 emulation.

Yes! There is quad-float support in hardware with ISA 3.0 (Power9).

This would likely end up being low priority (Like RISC-V support), also we would want something in CI if supported, since core emulation is fragile and easy to break.

There is ppc64el support in Travis-CI, if that would be useful for you. I see that this repo seems to use GitHub Actions; it looks like there is an obtuse but functional way to use a Power system from that. Feel free to ping me when CI would be useful, as options and available resources for open source projects may change in that timeframe.

Sonicadvance1 commented 1 year ago

A lot of good information there! Looks like it might be viable to have a non-AVX implementation. Slightly annoying is that there doesn't seem to be a 128-bit CAS, but you can use lqarx+stqcx to emulated it. Similar to how ARMv8 has reservation atomic loadstores.

Took some additional peeking at the register arrangement on the platform, it seems like there are 32 GPRs, 32 FPRs, 64 vectors? Not sure how the FPRs and Vectors overlap but hopefully something like how SSE overlaps MMX, or maybe both. So static-register allocation likely fits in on that platform.

So we just need a fast and lightweight code emitter for ISA 3.0 and someone with time to find all the problems with implementing it in FEX.