DX12 and Vulkan backends?

ADormant commented 8 years ago

I wonder if there is a chance for DX12 backend in PCSX2 it gives huge performance increase.

https://github.com/dolphin-emu/dolphin/pull/3364 https://forums.dolphin-emu.org/Thread-unofficial-dolphin-dx12-backend?page=5

jcdenton2k commented 8 years ago

That is impressive. Once Vulkan API matures a bit and provides devs with mature tools (instead of having to hand-code the assembly from scratch) then we can see a lot more widespread support for it.

mercuriete commented 7 years ago

How it is going? Vulkan drivers all of them are passing the conformance test so i think all are production ready. I am really looking forward this feature. You guys are awesome. Thanks for your work.

refractionpcsx2 commented 7 years ago

It was never about the stability of the new APIs, it was about effort to code it vs the gain, to which there is very little

TheLastRar commented 7 years ago

On an off-topic note, Mednafen, a PS1 emulator, has a Vulkan backend as of December 2016.

In-depth article here https://www.libretro.com/index.php/introducing-vulkan-psx-renderer-for-beetlemednafen-psx/

LazyBunny commented 7 years ago

refractionpcsx2, I'm not so sure about that on the minimal gain. For weaker systems like the Lattepanda, or even mobile devices, Vulkan has a major impact. Dolphin is evident enough. On the panda's limited DX12(11_1) support, gamecube and wii games gained at least 30% framerates by switching to DX12 over the other available options.

gregory38 commented 7 years ago

Passing conformance test is different of a stable driver. AMD has passed the openGL conformance test for 2-3 years and yet we are still waiting a driver that can render properly without BSOD (or whatever it is called now).

And we still don't have free Vulkan driver and good tool to debug.

There is 2 massive differences with Dolphin. Their core emulation is faster and they get likely more draw call. If you want to achieve +30%, you basically need both VU ans EE threads below 70%. And GSdx limited by the validation/draw call number. If you're limited by EE/VU.

You can still get some bonus based on your computer. On 2 cores if the GS thread is faster you can reallocate the computing to others thread, good. On 4 core you might win a bin on turbo if you're lucky, otherwise one core will idle more. On small board, you can get a massive boost because you will get less throttling.

If you want faster emulation, buy a better computer ;) IHMO, optimization for slow CPU is a waste of time.

gregory38 commented 7 years ago

To complete my previous message. Since the 1.4 release, the code get various speed improvement. The rendering correctness is 10 times better. For example people said we need Dx12 because Ratchet & Clank is slow on good computer. Then I implemented a kinds of mipmapping and now it is much faster. As you can see the speed isn't about hype API versus older API.

So far, with one year behind us, I can tell you that I don't regret that we didn't lose time to implement Vulkan/DX12. IMHO, we have bigger priorities such as a 64 bits port of the not-yet-ported code.

gregory38 commented 7 years ago

For reference

Nvidia released a C++ Vulkan wrapper https://developer.nvidia.com/open-source-vulkan-c-api
Google did the same recently http://phoronix.com/scan.php?page=news_item&px=Google-vulkan-cpp-library
Nvidia released recently a framework to use Vulkan http://www.phoronix.com/scan.php?page=news_item&px=NVIDIA-VkHLF-Vulkan

mirh commented 7 years ago

http://gpuopen.com/gaming-product/anvil-vulkan-framework/ https://github.com/nyorain/vpp

EDIT: https://gpuopen.com/v-ez-brings-easy-mode-vulkan/

gregory38 commented 7 years ago

Is it me, or anvil is a great name for a fast framework created by AMD ;)

So both AMD & Nvidia creates an extra API to add ref-counting the Vulkan structure. It was cheap to not include it on the initial spec.

FlatOutPS2 commented 7 years ago

Is it me, or anvil is a great name for a fast framework created by AMD ;)

Only if NVIDIA's would be called brittle.

LazyBunny commented 7 years ago

"gregory38: If you want faster emulation, buy a better computer ;) IHMO, optimization for slow CPU is a waste of time."

For me, that's not an option with my intended uses. I'm aiming for single bord x86/x64 computers like the Lattepanda and Upboard. And other mini game systems like the GPD-Win, and the Smach-z. Boards where upgrading individual components just isn't an option. I want to see a full Emulation system the size of an n64 game cartridge.

mirh commented 7 years ago

Putting aside that:

just because one wants something it doesn't mean physics or whatever surely will agree
"slow" is no real qualifier and for as much you and I know, it could already be perfect
in my perfect world AMD has first of all fixed their damn opengl

The only vaguely meaningful optimization I see there is leveraging the fact the smach is HSA-compliant (which means zero copy is possible, which means bla bla bla). Which is something that for as much I see gregory remotely interested would require at minimum somebody to buy him the required hardware/dev-board.

gregory38 commented 7 years ago

Before I have a delivery man in my front door, I don't have time ;)

IMHO, HMM/HSA would only be interesting with Native resolution. It will allow to emulate the GS memory as a coherent memory (and plenty of sync issue). It would avoid all texture conversions which are really the killer in CPU/GPU perf (that why sometime the SW renderer is faster). Anyway, the future is first programmable blending.

lightningterror commented 7 years ago

@mirh AMD + OpenGl ... I get nightmares just thinking about that.

fagoatse commented 7 years ago

@lightningterror nah, it's actually pretty good providing you use the latest mesa. It's definitely getting there.

lightningterror commented 7 years ago

For OpenGL I see GL_ARB_bindless_texture was removed , the info about it seems it should provide a speed bump , maybe it could help amds fail drivers. Was the code really that broken ?

gregory38 commented 7 years ago

it should provide != it provides The extra complexity wasn't worth it.

However, GSdx state isn't the same nowadays. We use to have 1/2 textures. We can now have 3-5 textures. Potentially my implementation was bad. Hopefully the extension will soon be implemented in Mesa so we will be able to understand how it is working.

I'm afraid that AMD driver is long road. IMHO, we need

a working SSO implementation
a performance fix for SSO
multi-thread OpenGL

SSO explanation. SSO allow to change the Fragment Shader (FS) without revalidate the Vertex Shader (VS). Feature was introduced in Dx9 (or maybe before).... In our case, FS is updated at a high frequency (1-5 draw calls). VS is updated at a much lower frequency (and potentially could have been 0 if I didn't need to put ton of hack to support AMD/Intel driver). It means that AMD/Intel driver does a lot of extra validation for nothing.

By the way, the speed issue could be also a limitation of AMD architecture. Gsdx does a lots of draw call with few primitives. Modern GPU are designed to handle big number of primitives in one shot. Maybe the overhead to process a command in the GPU is bigger than the time to process the draw call. Hence the stalling of the application.

lightningterror commented 7 years ago

From what I read/watched(if I'm correct) how amd and how nvidia do scheduling is quite different. Nvidia does it on the driver whereas with amd you need to specify resources to what core they should go or something like that , so that leaves devs to implement it on their software instead since the amd driver is quite different than from nvidia.

If that's true then maybe multithreading needs to be added specifically for amd gpus in gsdx.

gregory38 commented 7 years ago

Please don't read random info from fanboy that never wrote a single line of code :) GPUs are really a complex domain.

Nvidia driver can use multiple threads for various operations. Whereas AMD is more single threaded (I'm pretty sure they use some MT but definitively less). Then you have the hardware scheduling which is unrelated and became the trending hype AKA asynchronous compute whatever... So yes AMD gives dev more possibility to dispatch the rendering command in different resources wth different priority. But there is no compute in GSdx so it is a moot point. Anyway soon Mesa driver will support MT gl, so we will be able to have nice comparison

What we need is a gl thread dispatcher. The GSdx thread will store gl command into a queue. The gl thread will read command from the queue and will execute them. This way when the gl thread is busy to execute gl command. GSdx thread can prepare the next draw (vertex/texture conversion for example)

mirh commented 7 years ago

Besides, let's not forget than the slowest and oldest (1.8ghz) C2D+nvidia was like 3x times the framerate of my 3.2 one+amd. They simply have some code that fucks up over itself, it's not just multi-thread. EDIT: I'm not sure what's the point in this issue, it's not like anybody would have to be reminded about this xD

Enverex commented 7 years ago

What we need is a gl thread dispatcher.

Something like this? https://github.com/NVIDIA/libglvnd

mirh commented 7 years ago

Wtf? That dispatches calls between system and driver, not between game and driver. It has nothing to do with rendering and threads.

gregory38 commented 7 years ago

yes it is unrelated. The goal of glvnd is to switch gl driver at runtime instead at reboot.

Squall-Leonhart commented 7 years ago

What we need is a gl thread dispatcher. The GSdx thread will store gl command into a queue. The gl thread will read command from the queue and will execute them. This way when the gl thread is busy to execute gl command. GSdx thread can prepare the next draw (vertex/texture conversion for example)

aren't nvidia the only ones that have that in NV_Command_list?

mirh commented 7 years ago

Not sure nv_command was mentioned. Anyway, even mesa is threaded.

gregory38 commented 7 years ago

No offense but people should stop to post random word. NV_Command_list records all the states into a single blob state (which can be seen as a list a of command). It is a way to achieve something closer of Vulkan/Dx12 API but with OpenGL.

Here we deals with basic multi thread approach. Instead to do

do gsdx stuff
exec gl cmd1
wait execution done
do gsdx stuff
exec gl cmd2
wait execution done

We do

do gsdx stuff
Ask your buddy to exec cmd1
do gsdx stuff
Ask your buddy to exec cmd2

And buddy will do

exec gl cmd1
wait execution done
exec gl cmd2
wait execution done

Note: Mesa threading isn't yet compatible with PCSX2. And it won't be ready for the soon to be released version.

gregory38 commented 7 years ago

Fwiw, I have some patches to improve Mesa threading. It really give me a nice speed boost (on blood will tell: ) even on my haswell 4Ghz. Unfortunately I found some bad stuff in Mesa so it will crash after 5-15 minutes of gameplay...

willkuer commented 7 years ago

Patches to Mesa or patches to pcsx2?

gregory38 commented 7 years ago

Mesa patches => https://lists.freedesktop.org/archives/mesa-dev/2017-April/152397.html

RinMaru commented 7 years ago

If PCSX2 is going to rid itself of DX9 I think it would be better just to Rid it of DX entirely and use Vulkan since its usable in Linux and Windows plus any card that supports DX11 or 12 is sure enough to support vulkan and it would narrow down everything to one backend. Im no dev but this is just my opinion

MrCK1 commented 7 years ago

That's not a viable option right now. AMD users are already forced to use the DX backend due to driver issues that AMD hasn't resolved. In addition to that, the time it would take to implement Vulkan versus what we would get back in performance benefits isn't worth it.

RinMaru commented 7 years ago

well thats what i was thinking about. Vulkan would get around AMDs dodgy GL drivers and it seems pointless to keep DX around (After) if a Vulkan backend ever gets made. If DX9 is going to be dropped in the future regardless if its far off, and we are left with GL and DX11 why not just slowly phase out DX11 as Vulkan develops since any card that supports DX11 can support Vulkan AFAIK and there would be no point in sustaining a windows only backend anymore

mirh commented 7 years ago

Also, contrarily to whatever scare they have at dolphin, (possibly because plugins perfectly modularize stuff? I dunno) we have no X renderer is a burden to Y renderer problem. Anyway, everything is up to whoever devs will want to tackle the challenge.

funny thing: if CL gets merged into Vulkan in the future, we could say we technically already have a Vulkan renderer EDIT: @gregory38 you should resend your patches I guess?

gregory38 commented 7 years ago

Various DX11 card won't support Vulkan. Besides

Dx11 renderer is based on Dx10 features
OpenGL renderer is mostly based on Dx10 features. For example it runs fine on Sandy Bridge Linux.

Eventually both DX11 and OpenGL renderer will/might die. But Vukcan won't solve the texture cache management. And we need advance blending. I'm not sure it is exposed in VUlkan as it requires at least a Maxwell GPU on Nvidia side. By the way, this extension will reduce the number of draw call and increase the load on the GPU. So Vulkan gain will become smaller.

RinMaru commented 7 years ago

What do you plan to move GSDX to when and if those go? I just hope whatever happens my RX480 will handle it. Ill upgrade to a GTX 1080 in a few years though maybe

gregory38 commented 7 years ago

I don't have any plan. My GPU is an "old" Kepler. I won't upgrade soon as I want a sub-75W but powerful enough GPU with free driver support.

I don't know the AMD status neither Intel one. I think recent GPU should be ok but I really don't know.

lightningterror commented 7 years ago

AMD has some probs with Vulkan as well (cough blending) , also it's good to have several api available. Some might have issues so it's good to have an alternative. Take intel for example. DX11 has issues on Kaby Lake , OpenGL is a mess and you might want to use DX9.

RinMaru commented 7 years ago

It sucks being an AMD user right now >.< just got BSOD with SilentHill4 in OGL the only SH game I haven't beaten and DX11 has an entire layer of atmosphere missing. I've heard Nvidia has issues too but I'm not aware of how bad.

mirh commented 7 years ago

> Buy an AMD card > Nuke windows and say hi to Tux > Install open sauce driver > Profit

...Anyway, please, really, it's really all up to whatever fancies a willing dev will have. And I don't know of anybody with either time or will to begin with. So please, let's stop the quite wishful thinking chatter.

ghost commented 7 years ago

Vulkan please. Replacing the existing OpenGL renderer which I hear is much slower than the existing D3D renderer with a single Vulkan renderer would help out PCSX2 a lot. While you could focus on the OpenGL renderer for both Linux and Windows, it might be easier to just pave over the old renderers with a single API and focus on it instead. Less code to maintain.

A new GUI would also really help to modernize it!

lorendias commented 6 years ago

After many years Metal Gear Solid 2 intro scene still lags, a modern implementation would be welcome especially if it resolved the issue.

Lithium64 commented 6 years ago

What is needed to create a Vulkan renderer? why not crowdfunding this project?

MrCK1 commented 6 years ago

We would need more developers/manower. Crowdfunding is still possibility in the late future.

Squall-Leonhart commented 6 years ago

Would need competent vulkan implementations across the card vendors, as Gregory has pointed out.

FlatOutPS2 commented 6 years ago

Vulkan !=magic solution to performance issues. We would be better off with more people working on core GSdx issues than we would be with a working Vulkan backend.

ghost commented 6 years ago

If the OGL renderer is much slower than the D3D one, unless someone can fix the performance disparity, VLK is an option. Depends on what contributors are good at doing I guess. Once you have VLK going you don't have to worry about specific driver bugs like with OGL, so it seems easier to maintain in the long run, just more work up front. The RPCS3 devs sure seem to love it: https://rpcs3.net/blog/2018/01/23/rpcs3-2017-wrap-up-a-stunning-year-of-progress/

Squall-Leonhart commented 6 years ago

X-Y=/=Z

You might be oblivious to this, but OpenGL issues that occur in AMD drivers also often affect Vulkan too in some way.

gregory38 commented 6 years ago

@Swiftpaw OGL is as fast as D3D (well OGL has better vertices streaming capabilities). However AMD proprietary OGL implementation is bad. And nothing prevent AMD to release a broken Vulkan implementation too. It is sad to spend weeks of work to have a working solution for only AMD users...

Gsdx main speed issues isn't the rendering API overhead. But the emulation of the GS itself which doesn't map well to modern GPU. See my previous post for an example of what can be done to really improve the emulation.

fagoatse commented 6 years ago

AMD's OGL performance is probably only slower on Windows. I'm pretty sure mesa is faster by now.

PCSX2 / pcsx2

DX12 and Vulkan backends? #1047