Question: Have you taken a look at HQEMU?

roybaer commented 3 years ago

Given that Hangover's performance is reportedly mostly limited by QEMU, I would like to ask whether you have heard of HQEMU.

To quote from HQEMU's webpage:

HQEMU is a retargetable and multi-threaded dynamic binary translator on multicores. It integrates QEMU and LLVM as its building blocks. The translator in the enhanced QEMU acts as a fast translator with low translation overhead. The optimization-intensive LLVM optimizer running on separate threads dynamically improves code for higher performance. With the hybrid QEMU+LLVM approach, HQEMU can achieve low translation overhead and good translated code quality.

HQEMU supports process-level emulation and full-system virtualization. It provides translation modes of running the QEMU translator and LLVM optimizer in one process, or running the LLVM optimizer as a stand-alone optimization server (version 0.13.0).

I have not had a chance to try it out myself but the description sounds promising.

AndreRH commented 3 years ago

See discussions at https://github.com/AndreRH/hangover/issues/77#issuecomment-665043119 And https://github.com/AndreRH/hangover/issues/20#issuecomment-467401567

roybaer commented 3 years ago

Interesting read.

In the meantime I have been able to rebase HQEMU's LLVM patches onto more recent LLVM versions with some manual intervention. The full LLVM build process succeeds for the patched versions 7, 9, 10 and 11, while version 8 fails to build for unrelated reasons. A successful build obviously does not mean that it still works, but I cannot really test it right now, because I do not have the relevant AArch64 hardware handy.

When it comes to HQEMU's additions and modifications to QEMU, it is probably easier to manually reapply them to a new QEMU.

AndreRH commented 3 years ago

Could you please try to apply the qemu changes onto our qemu?

roybaer commented 3 years ago

I can try, but it's going to take a while. Right now, HQEMU does not even compile with the updated patched LLVM, because of API changes. If we rely on LLVM 6, only, the changes to the QEMU code base still amount to 2454 insertions and 331 deletions, not counting newly added files. We'd have to see how much QEMU has changed from version 2.5 to version 5.

stefand commented 3 years ago

One conceptual problem with optimizing the generated ARM code is exception handling: It is difficult to impossible to merge two x86 instructions into one ARM instruction (or any other less-than-1:1 matching). If there's an exception in an ARM instruction that doesn't clearly match an x86 instruction qemu can't properly report the exception location.

I don't know if HQEMU attempts to do a n:m optimization or if it attempts to do anything about signal handling in this case.

roybaer commented 3 years ago

I somehow doubt that LLVM's optimizer is going to pay any attention to that. It's probably going to be the typical speed vs. accuracy trade-off. I get the impression, though, that the byte-exact location of an exception only really matters in combination with anti-debugger code. HQEMU is apparently at least good enough to run Windows XP in full system emulation mode and the speedup is very desirable.

owlshrimp commented 2 years ago

I somehow doubt that LLVM's optimizer is going to pay any attention to that. It's probably going to be the typical speed vs. accuracy trade-off. I get the impression, though, that the byte-exact location of an exception only really matters in combination with anti-debugger code. HQEMU is apparently at least good enough to run Windows XP in full system emulation mode and the speedup is very desirable.

This could be highly problematic. A strong driving force behind WINE these days seems to be VALVe's Proton fork and it's use in gaming on Linux, which has been quite technically successful. The games on their Steam platform were produced by varoius publishers for windows, often several years ago. Many of them contain a large number of DRM measures over which VALVe has no control. If the emulation of x86 isn't accurate enough, particularly against anti-debugger code, then it would block the emulation of these games on non-x86 platforms.

I could see VALVe wanting to pursue this in the future (they have supposedly been working on a Nintendo Switch competitor, but have been forced to use a less power-efficient x86 mobile chip from AMD instead of an ARM chip from NVIDIA) so some future way of mitigating this is probably worth consideration.

Perhaps in the future regular checkpointing could be employed and more instruction-accurate emulation selected to roll forwards in the event of a (rare) exception?

AndreRH / hangover

Question: Have you taken a look at HQEMU? #95