AndreRH / hangover

Hangover runs simple Win32 applications on arm64 Linux
GNU Lesser General Public License v2.1
1.26k stars 92 forks source link

Please work with Raptor CS to port this to the POWER9 (ppc64le) architecture #20

Closed volkertb closed 3 years ago

volkertb commented 5 years ago

Wow, what an amazing project! 😃 This will grow in importance as more people start using ARM-based laptops (and possibly desktop systems?).

But there is also another interesting architecture to consider for Hangover: POWER9. This architecture supports both big-endian and little-endian (ppc64le) modes. Since Raptor CS has been offering completely open POWER9-based desktop solutions, particularly their new Blackbird motherboard, which is quite affordable as far as high end desktop systems go, having the possibility to run popular Windows games on such systems would suddenly make them even more compelling to a lot of people.

Would you be wiling to consider porting Hangover to POWER9/ppc64le? Perhaps you could reach out to Raptor CS for cooperation on this? Maybe they'd even be willing to donate development hardware to you for this purpose.

Thank you for considering this and regardless, great work!

madscientist159 commented 5 years ago

Raptor developer here. What would you need to make this happen? VPS access, something else?

ToxicDragon commented 5 years ago

+1 This would be very much appreciated!

Additonally, according to point 9) of your readme, this may not be that much effort:

9) Porting to other host architectures Porting to little endian host architectures should be fairly simple. You will have to replace a few bits of host specific assembler code, the most complicated one is the vararg forward code in dlls/include/va_helper_impl.h. The compiler will make you aware of other places through #error statements in ifdef guards.

AndreRH commented 5 years ago

Raptor developer here. What would you need to make this happen? VPS access, something else?

time :)

This task consists of two parts: 1) Port Wine to ppc64le 2) Port Hangover to ppc64le

Hangover is relatively easy, and for the Wine part there is already a GSoC idea since years: https://wiki.winehq.org/Summer_Of_Code#Portability_-_Port_WineLib_to_a_new_architecture As I did the ARM and ARM64 ports of Wine, I would be happy to mentor a student doing the ppc64le port, but no applications for it the last years... If that's done, it should only take a couple of evenings to adjust Hangover

madscientist159 commented 5 years ago

We're quite interested in winelib on ppc64el for other reasons already; see feature request at: https://bugs.winehq.org/show_bug.cgi?id=46330

We can provide access to development resources in support of this project as needed.

madscientist159 commented 5 years ago

Since this is something I'd like to see (UT2004 at reasonable performance on POWER anyone? :wink:) I went ahead and started an initial port over the weekend:

https://wiki.raptorcs.com/wiki/Porting/Wine

While I've written a good part of the needed assembly and managed to at least get wineserver itself starting, wine is crashing when it actually tries to run a process like wineboot. gdb isn't giving me much info and winedbg just crashes (it's a pseudo-win64 app after all), so if anyone more familiar with wine internals wants to take a closer look please let me know!

EDIT: Found the problem with a function not marked as needing a long call. Just need to fix wine_call_on_stack() at this point!

madscientist159 commented 5 years ago

Port complete! :tada:

https://github.com/madscientist159/wine/commit/2ef945f9c245e53a83e61b48067dc6784259430a

I'll start working to get this upstream into wine; in the interim, perhaps someone can work on getting hangover to function with it? :wink:

madscientist159 commented 5 years ago

I've submitted the Wine patches here: https://www.winehq.org/pipermail/wine-devel/2019-February/140231.html

@AndreRH do you need anything else to adjust hangover for ppc64 support at this point?

stefand commented 5 years ago

That was fast :-) . I think you won't get far without proper SEH support though. Even though hangover has guest-side SEH handling that will take care of x86 exception frames, there are still cases where the Windows API throws internal exceptions and handles it internally. OutputDebugString is such a case, there are similar places in user32.dll and ddraw.dll

madscientist159 commented 5 years ago

@stefand Unfortunately I'm unlikely to have time in the near future to work on SEH. Is there any way to start the hangover work in parallel with a future SEH implementation? I suspect once we can show playable games on POWER more people will be interested in helping out.

stefand commented 5 years ago

Yeah, SEH is a complex biest.

I think - but that's not a promise - the work so far is good enough to bring hangover to a point where notepad.exe starts. For myself I am lacking time too, and I'll be away from home in the next few months, and I don't have POWER hardware unfortunately. I am very happy to provide help for anyone who is interested.

the Talos II system looks pretty beefy. Don't expect anything other than 90's games to run performantly in hangover though, unless you spend a lot of effort into improving qemu's performance.

madscientist159 commented 5 years ago

Don't expect anything other than 90's games to run performantly in hangover though, unless you spend a lot of effort into improving qemu's performance.

It's a start! :wink: Something hangover should really look into is HQEMU; the performance improvements made are quite dramatic. For reference, HQEMU can boot a WIndows XP install and it has performance comparable to maybe a 700MHz to 1GHz Pentium III system when running in system mode on POWER. The same XP install booted on regular QEMU on the same POWER system is totally unusable.

I'm willing to grant remote access to a dedicated POWER host if that helps. It can be shared among several developers if needed -- it's on our cloud platform so bandwidth and availability are non-issues.

stefand commented 5 years ago

HQEMU sounds interesting, but appears to be a one-off research project that is no longer maintained. From the abstract at https://dl.acm.org/citation.cfm?doid=2259016.2259030 https://dl.acm.org/citation.cfm?doid=2259016.2259030 I gather that they replaced qemu's TGC with LLVM to get better code quality, and made this thing multithreaded to counter LLVMs rather slow speed.

If my reading is right I think the most valuable part about it is knowing what can be achieved with a better translator. Hooking up LLVM somehow (either by making qemu use it, or replacing qemu entirely) was an idea I had in my mind, although I'd handle LLVMs slowness by caching translated code instead of being smart about multicores.

madscientist159 commented 5 years ago

@stefand It's active enough that we reached out to them last year and they ported it over to POWER. At the very least, this means that a lot of the hard work has been done in theory...

AndreRH commented 5 years ago

@madscientist159 congrats to the initial port, I have some comments on them and will post them on the list regarding hQEMU, they don't run on arm64, right?

madscientist159 commented 5 years ago

@AndreRH the last copy of HQEMU I have, from late 2018, supports arm64 and ppc64el hosts exclusively. It looks like their site is down, so if you want a copy of the GPL sources just let me know.

stefand commented 5 years ago

Your mail to wine-devel about page sizes made me worried. I don't expect hangover to work as-is on 64k page size systems. I know that qemu user emulation generally supports that, but hangover doesn't have any of the VM abstraction magic that qemu-linux-user has. It uses the same address space on the guest and host, so if an application wants to VirtualAlloc something on a 4kb boundary the host has to be able to provide it. Abstracting the address space would be hell, considering in how many places pointers are passed back and forth.

(MapViewOfFile can't map at 4kb boundaries. Maybe we're lucky and Windows doesn't give applications fine-grained page granularity and we can get away with 64k pages as-is. I have big doubts though)

madscientist159 commented 5 years ago

@stefand Due to several historical quirks Windows effectively uses a 64k page size. Windows NT was originally designed to be multi-architecture; it had to support some interesting hardware back in the original NT days.

In the absolute worst case we'll just need to use a 4k kernel with hangover. Let's see if we can make the 64k one work first though.

madscientist159 commented 5 years ago

@stefand I pushed some initial debugging enablement patches to my WIP repository. When you get closer to working on this let me know and I'll take another look at fixing up the rest of the debug support?

AndreRH commented 5 years ago

I just added a branch called ppc64le (work in progress) Currently only callbacks are implemented in Hangover, comments on the code welcome, it seemed to me that ppc64 can't do PC relative addressing, so I loaded the address into a register with 5 commands and then loaded the register from its contained address... Note that testing was done with a seperate dedicated c file on linux which simply did a memalign() and a memprotect(), Wine wasn't involved. On the Wine side I backported your patches @madscientist159 because Wine changed heavily after the stable version, didn't compile test it yet...

shawnanastasio commented 5 years ago

Hey @AndreRH, I've got a comment on the ppc64le callback implementation.

it seemed to me that ppc64 can't do PC relative addressing, so I loaded the address into a register with 5 commands and then loaded the register from its contained address

Starting with ISA 3.0 (POWER9+), it is actually possible to do PC-relative addressing using addpcis. The extended mnemonic lnia will load the next instruction address (PC+4) into the first register operand.

This will unfortunately not work on POWER8 systems, though. If you want to keep compatibility with them, another approach is to use an unconditional branch with linkage instruction and then use the mflr instruction to copy the link register to a GPR. With this approach, you'll also need to make sure that the LR is properly restored before returning.

Either approach should work better than the gymnastics required to load a 64bit immediate value :)

AndreRH commented 5 years ago

I also thought about the LR trick, but it sounds like less performance due to pipeline stalling and less readable...

shawnanastasio commented 5 years ago

I'm not sure how big (or small) the performance impact of that technique is, but it's not uncommon to see similar techniques in the wild. I don't think readability is a big issue either, especially if comments are added explaining what's going on.

awilfox commented 5 years ago

Semi-relatedly, is there any real reason beyond "it'd be a lot of work" that it can't swizzle for big endian? I would think theoretically you could do it at the call boundary site, though I haven't actually taken a very deep look at the innards of this code.

I ask because I'd be super-excited to see this work land in Adélie Linux, but we're BE only (ppc64, ppc32, and soon aarch64_be).

stefand commented 5 years ago

With enough effort you can make big endian work, but you'd be better off switching your distro to LE (If my understanding is right the hardware you listed can do both).

My guess is that only 5% of the Windows API calls pass pointers to structs that are 32/64 incompatible, and handling those is already a gigantic pain. With BE you'll have to deal with 100% and I don't see the return on investment work out for that.

awilfox commented 5 years ago

The hardware can't do both (I'm unaware of any commercial LE ppc32 boards at all; only the bleeding-edge newest ppc64 can run LE).

Thanks for the info, I appreciate it. Probably using VMs is the better bet for us.

shawnanastasio commented 4 years ago

Has any progress been made since the last update? If there is still some work to be done on the hangover side, I'd be happy to help if someone could give me an overview of what needs to be done (cc @stefand).

stefand commented 4 years ago

I haven't done anything on my side. I am in a wait-and-see mode to see what happens with Wine's Mac 32 bit work and migration to PE binaries. This has the potential to make the semi hand-written thunks in hangover obsolete.

shawnanastasio commented 4 years ago

Thanks for the update. Guess we'll have to wait and see what wine does then.

darkbasic commented 4 years ago

Your mail to wine-devel about page sizes made me worried. I don't expect hangover to work as-is on 64k page size systems. I know that qemu user emulation generally supports that

@stefand can you please elaborate? qemu-user doesn't work for me if the ppc64le host has a 64k PAGE_SIZE while the target (usually x86_64) expects 4k.

stefand commented 4 years ago

@darkbasic, I never used it myself, I just saw this mentioned in the qemu docs and code comments. It may be one of those things that got implemented once and then bitrottet over time.

madscientist159 commented 4 years ago

@darkbasic You're largely correct about the page size, when I implemented this I was using a 4k kernel, and I don't think it's an unreasonable burden to require a 4k kernel for Hangover.

If there's interest in actually doing the implementation for this again, I can pick the work back up, just let me know. :smile:

darkbasic commented 4 years ago

qemu developers proposed enabling softmmu for linux-user. With that, arbitrary mappings can be made between host and guest and it would work with a 64k page size as well. But it's a fair amount of effort and I don't know if it will ever land. In the meantime, working support for 4k kernels would be awesome :)

lgsmith commented 4 years ago

Not that I can really help with any of this, but as a bump I'd like to provide you all with as much moral support as you can tolerate. It would be wicked to actually be able to play a modern title on a POWER9 system, and this looks like one of the only ways that may ever be possible, unless I'm confused.

But I would also like to ask a (naive) question: shouldn't there be more interest in providing support for this kind of thing? Games are just applications, and if your project can run games in a performant fashion presumably there's some kind of crossover to running virtualized x86 apps in a more performant fashion, which seems like it would be terribly interesting and commercially viable. Is that the wrong way to think about this?

Thanks again for digging into this problem.

madscientist159 commented 4 years ago

Just saw this bumped again....would getting a POWER9 box in the hands of one of the Hangover developers be an incentive to make this happen? The box would be donated, so the developer would keep it after the work is done (also provides a bit of incentive to keep it working...)

stefand commented 4 years ago

You're overestimating how much hangover can do, how much time we have available for it and how many non-x86 systems that are otherwise viable for running Windows apps there are. For one, Hangover is just a fun project for André and myself and we haven't done any real work on it in the past few months :-( . Right now an Nvidia shield is comparable to a Pentium 100 at best. No matter how powerful your power box is, it won't outperform a PC from 2000 with the current status of hangover.

Most systems that have non-x86 CPUs are either specialized servers and don't bother running x86 software anyway or they have non-PC form factors (Tablets, Phones) that make mouse and keyboard based apps not fun. ARM based Chromebooks are the most promising devices, and some Windows games might work well with touch based input. So yes, there is commercial incentive, but it isn't nearly as huge as @lgsmith suggests.

lgsmith commented 4 years ago

Fair enough. It was an honest question, in that I have a very limited understanding of how things like kvm-qemu work. The bright spot (I would have thought) would be that because graphics are somewhat separated from the cpu loads games can generate there could be some capacity to play something more complicated than just games that existed when I was in grade school. They didn't have Radeon RX 5700s in those days. But perhaps I'm missing something more fundamental about this?

I suppose most modern games, being multithreaded, would have a hard time running correctly in a single core, low frequency context no matter how fast rendering gets done.

On Thu, Jan 30, 2020 at 4:36 AM Stefan Dösinger notifications@github.com wrote:

You're overestimating how much hangover can do, how much time we have available for it and how many non-x86 systems that are otherwise viable for running Windows apps there are. For one, Hangover is just a fun project for André and myself and we haven't done any real work on it in the past few months :-( . Right now an Nvidia shield is comparable to a Pentium 100 at best. No matter how powerful your power box is, it won't outperform a PC from 2000 with the current status of hangover.

Most systems that have non-x86 CPUs are either specialized servers and don't bother running x86 software anyway or they have non-PC form factors (Tablets, Phones) that make mouse and keyboard based apps not fun. ARM based Chromebooks are the most promising devices, and some Windows games might work well with touch based input. So yes, there is commercial incentive, but it isn't nearly as huge as @lgsmith https://github.com/lgsmith suggests.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/AndreRH/hangover/issues/20?email_source=notifications&email_token=ABAZVYMH4Z5JZ4OPKYVEMS3RAKNSTA5CNFSM4GX7WEK2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEKKKLTA#issuecomment-580167116, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABAZVYNPIM6AE5HOXBGC3ULRAKNSTANCNFSM4GX7WEKQ .

madscientist159 commented 4 years ago

@stefand Sorry to hear that. Quick tests with straight QEMU / Wine on POWER are showing quite usable speeds, especially for apps like LTSpice (feels near-native in use). I still think there's significant potential for Hangover on the POWER platform vs. ARM.

AndreRH commented 4 years ago

Well, we once were promised a VM when I would have had time for such a task. But now I'm too busy with other things. Improving QEMU generated assembler would benefit the choosen architecture quite a lot, so if anyone is up for that task, go ahead :)

luke-jr commented 3 years ago

Linux is apparently removing PROT_SAO support - will that impact multi-thread support? :/

madscientist159 commented 3 years ago

Nah, it looks like it was an AIX leftover that isn't really used anywhere:

https://www.spinics.net/lists/linux-api/msg42112.html

----- Original Message -----

From: "Luke Dashjr" notifications@github.com To: "AndreRH/hangover" hangover@noreply.github.com Cc: "Timothy Pearson" tpearson@raptorengineeringinc.com, "Mention" mention@noreply.github.com Sent: Monday, August 17, 2020 1:18:58 PM Subject: Re: [AndreRH/hangover] Please work with Raptor CS to port this to the POWER9 (ppc64le) architecture (#20)

Linux is apparently removing PROT_SAO support - will that impact multi-thread support? :/

-- You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/AndreRH/hangover/issues/20#issuecomment-675035802

luke-jr commented 3 years ago

My understanding is that qemu doesn't support multithreaded x86 emulation on POWER, and PROT_SAO would be needed to do so efficiently?

madscientist159 commented 3 years ago

Do you have a link to any information on why that would be? If I can get details I can look more closely at it.

----- Original Message -----

From: "Luke Dashjr" notifications@github.com To: "AndreRH" hangover@noreply.github.com Cc: "Timothy Pearson" tpearson@raptorengineeringinc.com, "Mention" mention@noreply.github.com Sent: Monday, August 17, 2020 1:31:35 PM Subject: Re: [AndreRH/hangover] Please work with Raptor CS to port this to the POWER9 (ppc64le) architecture (#20)

My understanding is that qemu doesn't support multithreaded x86 emulation on POWER, and PROT_SAO would be needed to do so efficiently?

-- You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/AndreRH/hangover/issues/20#issuecomment-675041448

shawnanastasio commented 3 years ago

PROT_SAO specifies that a given memory region should use x86-style Strong Access Ordering instead of Power's more relaxed memory model. QEMU doesn't support multithreaded TCG for strong memory model architectures (e.g. X86) on weakly ordered hosts, and PROT_SAO might be the solution for that. As far as I'm aware it's not currently used anywhere in QEMU, though.

madscientist159 commented 3 years ago

Is there a technical reason for that QEMU TCG limitation, or is it mostly a convenience / ease of implementation item?

----- Original Message -----

From: "Shawn Anastasio" notifications@github.com To: "AndreRH" hangover@noreply.github.com Cc: "Timothy Pearson" tpearson@raptorengineeringinc.com, "Mention" mention@noreply.github.com Sent: Monday, August 17, 2020 1:47:14 PM Subject: Re: [AndreRH/hangover] Please work with Raptor CS to port this to the POWER9 (ppc64le) architecture (#20)

PROT_SAO specifies that a given memory region should use x86-style Strong Access Ordering instead of Power's more relaxed memory model. QEMU doesn't support multithreaded TCG for strong memory model architectures (e.g. X86) on weakly ordered hosts, and PROT_SAO might be the solution for that. As far as I'm aware it's not currently used anywhere in QEMU, though.

-- You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/AndreRH/hangover/issues/20#issuecomment-675049743

shawnanastasio commented 3 years ago

As far as I'm aware it's purely a technical limitation. Other than emitting superfluous memory barrier instructions all over the place in the generated Power code, I don't think there's any way QEMU could accurately match x86's semantics without something like PROT_SAO.

madscientist159 commented 3 years ago

Might be worth raising that concern on the kernel mailing list. While I suspect it's too late to fix anything for POWER10 in that regard, at least keeping the feature in earlier (and possibly later?) POWER versions could be useful.

----- Original Message -----

From: "Shawn Anastasio" notifications@github.com To: "AndreRH" hangover@noreply.github.com Cc: "Timothy Pearson" tpearson@raptorengineeringinc.com, "Mention" mention@noreply.github.com Sent: Monday, August 17, 2020 1:50:31 PM Subject: Re: [AndreRH/hangover] Please work with Raptor CS to port this to the POWER9 (ppc64le) architecture (#20)

As far as I'm aware it's purely a technical limitation. Other than emitting superfluous memory barrier instructions all over the place in the generated Power code, I don't think there's any way QEMU could accurately match x86's semantics without something like PROT_SAO.

-- You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/AndreRH/hangover/issues/20#issuecomment-675051254

shawnanastasio commented 3 years ago

Might be worth raising that concern on the kernel mailing list.

Done: https://lists.ozlabs.org/pipermail/linuxppc-dev/2020-August/216611.html

volkertb commented 3 years ago

Might be worth raising that concern on the kernel mailing list.

Done: https://lists.ozlabs.org/pipermail/linuxppc-dev/2020-August/216611.html

It's encouraging to see how your proposal to restore support for PROT_SAO started a thoughtful discussion, with an apparent general willingness to figure out a reasonable solution.

However, am I right in reading that the reason for the initial removal of this feature from the Linux kernel is because POWER10 won't support it anyway? Wouldn't that still necessitate the "emitting of superfluous memory barrier instructions" if Hangover is ever to be made to work with POWER10 and beyond as well as POWER9?

luke-jr commented 3 years ago

Poor performance on POWER10 doesn't seem like a good reason to perform poorly on POWER9. Especially in light of POWER10 being less free than POWER9.

"and beyond" depends on whether future processors add it back or not.

AndreRH commented 3 years ago

Hi,

We finally have this support in Hangover now :)

Though there are some caveats:

How you can help:

Big thanks to @madscientist159 for providing PPC64 access