AndreRH / hangover

Hangover runs simple Win32 applications on arm64 Linux
GNU Lesser General Public License v2.1
1.26k stars 92 forks source link

box86 and GL4ES instead of qemu #81

Closed Heasterian closed 9 months ago

Heasterian commented 3 years ago

Hey!

Could it be possible to use box86 instead of qemu? It should be faster option. GL4ES would be nice enhancement too.

stefand commented 3 years ago

This is the first time I hear about box86. It sounds interesting indeed.

Looking at the box86 site there's an obvious catch though:

Box86 lets you run x86 Linux programs (such as games) on non-x86 Linux, like ARM (host system needs to be 32bit little-endian).

Hangover right now only works on 64 bit hosts. Changing this is possible, but requires some work (see issue #3).

Did you do any benchmarking of box86 vs qemu, in particular CPU emulation performance? What's making qemu-linux-user slow is that it runs the entire Linux userland inside qemu. Hangover doesn't do that, so in many ways the big advantage of box86 is already part of hangover. How does it compare with a CPU heavy load that doesn't call libraries, e.g. calculating the shasum of a multi-gb file?

Heasterian commented 3 years ago

Only benchmarks that I have at this moment, are gaming benchmarks from youtube https://youtu.be/7he-KoiSe_U I should buy Jetson Nano this week so I could benchmark it for you.

stefand commented 3 years ago

That video doesn't have any useful data unfortunately. The games are ancient, there's no framerate data (except the one game that's stuck at 60 fps vsync).

My main development machine was an nvidia shield. Any performance numbers I get on it won't be useful for comparing to an RPI4.

What's interesting at this point is comparing two solutions (e.g. qemu, box86) for the same problem (running x86 instructions on arm). Similarly, but orthogonally, the question of syscall emulation vs thunking libraries needs to be evaluated for performance. (qemu-linux-user uses syscall emulation, hangover and box86 use library thunks).

stefand commented 3 years ago

A full game always runs into both. It has x86 code and calls the host system for e.g. 3D and sound. For a complete picture we also need tests that are heavy on the extremes in addition to "in-the-middle" games. E.g. shasum is heavy on CPU emulation needs whereas something like a webserver heavily interacts with the OS.

Heasterian commented 3 years ago

Now I can try it only on Ubuntu installed in Proot of my phone. I don't think it would give you any usefull data. I'll benchmark it probably next week on Nano.

AndreRH commented 3 years ago

@stefand I bet you heard about it from me: https://github.com/ptitSeb/box86/issues/4#issuecomment-474598788

Heasterian commented 3 years ago

I could benchmark box86 vs qemu and box86+wine vs hangover. I can post results here if you want.

stefand commented 3 years ago

A simple comparison of shasum on a huge file in box86 vs qemu would be a good start. Just make sure that the shasum binary you run in box86 actually does the work itself and doesn't call a library (gnutls? openssl?) through a thunk.

bylaws commented 3 years ago

You won't be able to benchmark on the nano due to the lack of a 32 bit userspace unfortunately

Heasterian commented 3 years ago

You just need to compile box86 with gcc:armhf and g++:armhf and it's working fine on Nano without video output. I need only to find good way to launch i386 sha1sum form terminal.

pjh64 commented 3 years ago

I made some benchmarks for comparison between box86 and qemu-i386-static. Programs i've used were prime95(FTL+Trail Factoring) and 7z b(benchmark). The device I've used is a RPI4.

bench.txt

For box86 the usage of native libs(instead of emulating them) can't be avoided in a reasonable manner as this(as well as its dynarec) is basically its underlying approach.

stefand commented 3 years ago

Those are pretty good results compared to qemu. Out of curiosity, what's the performance of a native arm binary on the same hardware?

The trail factoring numbers look weird, but the others speak a clear language. And I'd expect the heavy lifting to be done in the x86 binary and not lib calls - so the thunking box86 does shouldn't matter.

AndreRH commented 3 years ago

is that box85 with or without dynarec enabled?

pjh64 commented 3 years ago

@stefand: Did a short benchmark of 7z with the native binary. As for prime95 there doesn't seem to be an aarch64 or arm binary around. The weird performance gap that occurs when trial factoring with length factors around/exceeding 64 bit, is also replicable with other approaches to run x86 code on arm. For any solution that i know of it shows the best figures for length factors below 64 bit.

7znative.txt

@AndreRH: It had dynarec enabled. On interpreter-only 7z ratings are roughly around 10 times lower.

ghost commented 3 years ago

we make custom distros (twisteros.com) to make it suitable for arm64 builds. yes, we add multiarch to allow that. but there are other implementations, like appimages.

for performance number check here:

https://stands.fosdem.org/stands/box86/performances/

example on aarch64 on my channel:

https://youtu.be/BEYt5wzckvY

also another example here on rpi4:

https://video.fosdem.org/2021/stands/box86/

@AndreRH

The main problem we had with wine and would be amazing a collaboration here are 2: gl drivers doenst like the wrappers, there is nothing to do here, only wait for vulkan or gallium9. the second one, and the one you could hep is around wine itself.

we use wine x86, and we emulate it, I mean, his libs. that's not the efficient approach that box86 does, the twist, the wrappering. wrappering wine would be pointless if there is not a fork of wine to be maintained over time. wrappering the libs of wine (so this wine fork will be x86 but with wine ARM libs) would require a fork (this in my opinion) bc every change on wine libs could affect that wrappering and break it. it would be cool to know ptitseb opinion here. I am just assuming the problems it could have. I am not a dev, just his tester.

in theory, if "someone" maintain a wine fork like that it could bump the performance a lot, compared to the current strategy we use.

to be used on aarch64 we could use the spacingbat appimage to void multiarch https://github.com/SpacingBat3/box86-appimage and for non gl capable gpus (due drivers), it could be used gl4es.

for me, in conjunction with upcoming box64, that's the path that hangover should take. it's simpler, more efficient, more powerful.

but enough talking, I will share this topic with ptitiseb, he will bring more ideas to the table if he consider that box86 should be something for hangover.

Tarek-Hasan commented 2 years ago

If it can be any help, I'm just here to tell you guys Box86 project author has a 64bit version called Box64.

AndreRH commented 2 years ago

If it can be any help, I'm just here to tell you guys Box86 project author has a 64bit version called Box64.

Yes, didn't notice before your comment. So here are my results with it: https://github.com/AndreRH/hangover/blob/master/benchmarks/readme.md

YMMV

DarkShadow44 commented 1 year ago

FWIW, the box86 dev stated that they work on box32, aka 32bit programs on arm64. Maybe you can share ideas on how to do the wrapping?

JeremyRand commented 1 year ago

Based on the README changes in afee8d64ead86a787d3efa014d15d84a55e7e9e3, looks like using Box86 is on the roadmap.

Tarek-Hasan commented 1 year ago

Hey @AndreRH, how about updated benchmark with FEX-Emu and Hangover-Next, so we can see the performance improvements by using new methods.

Heasterian commented 1 year ago

@Tarek-Hasan In readme you can see that FEX and Box32 are on TODO list, so they are not yet implemented.

@AndreRH Btw, I don't think that you need to wait for Box32 (that will handle aarch64 => i386), you can use Box64 (aarch64 => amd64) for 64-bit apps and probably try to get WoW64 using it (now it just crashes when Wine is trying to use it).

DarkShadow44 commented 1 year ago

I don't think that you need to wait for Box32 (that will handle aarch64 => i386), you can use Box64 (aarch64 => amd64) for 64-bit apps and probably try to get WoW64 using it (now it just crashes when Wine is trying to use it).

WoW64 doesn't help here, box64 can't run 32bit x86 code.

AndreRH commented 1 year ago

WoW64 isn't made for 64-bit emulation, only 32-bit, see https://github.com/AndreRH/hangover/discussions/134#discussioncomment-5202094

darkbasic commented 1 year ago

Based on the README changes in https://github.com/AndreRH/hangover/commit/afee8d64ead86a787d3efa014d15d84a55e7e9e3, looks like using Box86 is on the roadmap.

That would mean losing any hope of ppc64le support. Hopefully it would still be possible to fallback to qemu.

AndreRH commented 1 year ago

Based on the README changes in afee8d6, looks like using Box86 is on the roadmap.

That would mean losing any hope of ppc64le support. Hopefully it would still be possible to fallback to qemu.

Qemu will stay in parallel to other emulators as soon as they get added

AndreRH commented 9 months ago

I consider this fixed, as we have Box64 now :)