AndreRH / hangover

Hangover runs simple Win32 applications on arm64 Linux
GNU Lesser General Public License v2.1
1.29k stars 95 forks source link

Vkd3d and DXVK support #28

Open awarus opened 5 years ago

awarus commented 5 years ago

Is it possible to run applications in Hangover by means of DXVK or vkd3d (Vulkan, not an OpenGL:ES)? And what I should do to run them ?

stefand commented 5 years ago

We have not put any effort into this. Conceptually it should work. Option 1 is to write a thunking library for Vulkan. Option two (presumably more performant) is to build an arm64 build of the dxvk libs and use the existing d3d10, d3d11 and dxgi wrapper libraries to do the d3d11 x86 -> d3d11 arm64 translation.

You're on your own though. Neither way is trivial, and you are likely to run into unforeseen problems.

stefand commented 5 years ago

Oh, and don't expect any d3d11 game to run at playable performance at this point.

awarus commented 5 years ago

Thank you for the answer. And what about vkd3d(d3d12 - > Vulkan)? It works ?

stefand commented 5 years ago

No

mpbagot commented 4 years ago

From what I understand, when running Hangover straight from the build directories, the host wine will simply call into any DLLs placed into the corresponding subdirectory in build/wine-host/dlls/.

Building DXVK for aarch64 seems to be not too difficult, I've forked the repo and made some small tweaks to get the dlls to build, but I haven't yet tested them. If you're interested, @awarus, feel free to clone and build what I've done, copy them over the default dlls, and give it a go.

stefand commented 4 years ago

Correct, if you put aarch64 winelib d3d11, d3d10core, d3d10, d3d10_1 and dxgi DLLs into the Wine tree it should in theory work. No guarantee though. The wrapper may also depend on some properties of QueryInterface and Get/SetPrivateData that isn't working as expected in dxvk, but that would either be a bug in dxvk or hangover and fixable.

An aarch64 PE build will probably not work because of the x20 register for the TEB. You'd have to make sure the aarch64-w64-whatever-gcc compiler does not inline calls to NtCurrentTeb().

cwabbott0 commented 3 years ago

I've started to take this on. Building DXVK as a winelib required a bunch of hacks/fixes in both winegcc and DXVK. I think that to do it "properly" I'll have to teach meson about winegcc to adjust a few things like the object and shared object default endings.

The only really DXVK specific thing I've run into so far concerns how devices are created. Hangover copies the wine d3d11/dxgi implementation here, which implements the whole internal "layer" mechanism (without actually implementing any layers). From what I could piece together, it goes something like:

DXVK doesn't bother doing any of that business, and just implements D3D11CoreCreateDevice and D3D10CoreCreateDevice directly, not implementing DXGID3D10CreateDevice at all. Hangover calls the host's DXGID3D10CreateDevice, which results in things going boom.

My current cheap hack is to call DXVK's D3D11CoreCreateDevice in DXGID3D10CreateDevice, which should work because you can query the D3D10 interface on the D3D11 device. However this won't work if wine's dxgi.dll is used on the host (this will eventually be required for d3d12 support). The other away around, of making D3D11CoreCreateDevice call DXGID3D10CreateDevice and then have that actually create the device in DXVK, also won't work in that case. Would it be possible to have hangover wrap D3D11CoreCreateDevice and D3D10CoreCreateDevice directly and drop the whole layer thing? Otherwise I'd have to actually implement all this stuff in DXVK (yuck).

stefand commented 3 years ago

I am not a fan of dropping the layer stuff because it is required to pass the Wine tests afair. If you manage to detect at runtime which dxgi you have (e.g. by trying to load / call the dxgi layer stuff and see it fail) it'd be fine with me.

There's the other thing I keep posting on all other bug reports: I think the need for hangover's wrapper libraries will go away relatively soon (I think somewhen in 2021) as Wine's PE/.so split is progressing nicely. As soon as I have some spare time I'll try to integrate qemu into the Wine syscall interface and ntdll loader, which should obsolete 95% of hangover's current code.

wined3d doesn't have a PE/.so split yet, and it is unclear how exactly it will work. dxvk doesn't have it either. Running the an x86 PE build and letting winevulkan do the guest-host thunking would work but I expect it to have very poor performance. Tests and benchmarks are needed to find the right setup, but my current expectation is that having the CSMT worker thread (and dxvk's equivalent) as the boundary might be the optimal approach - the guest side writes to the ringbuffer / pipe and never calls out of x86 space and the host side reads from it and never bothers about the emulator at all.

This will need some changes to wined3d, which is tricky enough on its own. It will mess up things a lot in dxvk as it is built externally. For vkd3d it will be even more difficult because it doesn't even have an asynchronous command stream.

cwabbott0 commented 3 years ago

Would it be possible to still have the layer stuff but then call D3D11CoreCreateDevice and D3D10CoreCreateDevice on the host?

If we need to implement the layer stuff in DXVK, then one annoying thing is that the DXVK HUD has the ability to show the API in use (D3D9, 10, or 11) which it does via an internal API-level member of the device which defaults to 11 but is set to 10 in D3D10CoreCreateDevice by querying an internal interface, and I think that won't work with the way hangover currently does things because there's no way to know which entrypoint was called in DXGID3D10CreateDevice.

Is the idea of the PE/.so split to replace winelibs like wined3d and winevulkan with two libraries and some sort of syscall interface between them? Where can I look further to see how it works? Unfortunately adding an asynchronous command stream for dx12 and vulkan probably isn't really an option, because the entire point of both API's is to be "low-level," so the path from the app to actual GPU command recording is supposed to be as small as possible. Apps are supposed to offload command recording to separate threads themselves. Adding an extra step of marshalling and then unmarshalling the syscall parameters would also seem to go against that philosophy, and I don't see how you'd avoid adding extra overhead in the "native" wine usecase compared to the current approach that directly calls the native functions from winevulkan, so there might be a performance impact on x86 for something like that. I guess the exact impact depends on how it's implemented though.

bylaws commented 1 year ago

I'm interested in working on this, I left a message on IRC but to me, compiling dxvk and all wine PEs as ilp32 ARM and having a thunk layer from x86->arm that works without any WoW intervention seems like a somewhat optimal approach. (Such that any DLLs dxvk depends on will have their ARM ilp32 variants resolved, and any the game depends on will have their x86 variants resolved, with thunk DLLs having some special attribute to note they're unique - I think this would end I as something analogous to CHPE). This way all the work winevulkan does for 32->64 conversion won't need to be duplicated and overhead from jit can be kept to an absolute minimum.

stefand commented 1 year ago

That's the plan for 64 bit x86 on arm64. Jacek Caban is working on arm64ec support for mingw and clang, then Wine will fill in the rest and most of Wine can be compiled as arm64ec DLLs.

For 32 bit on 64 bit it'll be more difficult. Address space and all, things like mapping vulkan buffers below 4GB etc.

bylaws commented 1 year ago

@stefand For android at least address space should be fine, since host import via a shim can be used or an exported hardware buffer can be mmapped. I suppose a similar ABI can be constructed for 32 bit x86 code and wine built to that which seems simple enough. (I realised after writing my initial comment that wine unixcall stuff already serialises all arguments into a struct, so the underlying calling convention of Unix code doesn't matter)

bylaws commented 1 year ago

I left message for jacek on the winehackers irc (is this the right place?), hopefully we'll be able to figure out a nice way of doing 32bit that doesn't involve too much overhead