Closed ADormant closed 2 years ago
That is impressive. Once Vulkan API matures a bit and provides devs with mature tools (instead of having to hand-code the assembly from scratch) then we can see a lot more widespread support for it.
How it is going? Vulkan drivers all of them are passing the conformance test so i think all are production ready. I am really looking forward this feature. You guys are awesome. Thanks for your work.
It was never about the stability of the new APIs, it was about effort to code it vs the gain, to which there is very little
On an off-topic note, Mednafen, a PS1 emulator, has a Vulkan backend as of December 2016.
In-depth article here https://www.libretro.com/index.php/introducing-vulkan-psx-renderer-for-beetlemednafen-psx/
refractionpcsx2, I'm not so sure about that on the minimal gain. For weaker systems like the Lattepanda, or even mobile devices, Vulkan has a major impact. Dolphin is evident enough. On the panda's limited DX12(11_1) support, gamecube and wii games gained at least 30% framerates by switching to DX12 over the other available options.
Passing conformance test is different of a stable driver. AMD has passed the openGL conformance test for 2-3 years and yet we are still waiting a driver that can render properly without BSOD (or whatever it is called now).
And we still don't have free Vulkan driver and good tool to debug.
There is 2 massive differences with Dolphin. Their core emulation is faster and they get likely more draw call. If you want to achieve +30%, you basically need both VU ans EE threads below 70%. And GSdx limited by the validation/draw call number. If you're limited by EE/VU.
You can still get some bonus based on your computer. On 2 cores if the GS thread is faster you can reallocate the computing to others thread, good. On 4 core you might win a bin on turbo if you're lucky, otherwise one core will idle more. On small board, you can get a massive boost because you will get less throttling.
If you want faster emulation, buy a better computer ;) IHMO, optimization for slow CPU is a waste of time.
To complete my previous message. Since the 1.4 release, the code get various speed improvement. The rendering correctness is 10 times better. For example people said we need Dx12 because Ratchet & Clank is slow on good computer. Then I implemented a kinds of mipmapping and now it is much faster. As you can see the speed isn't about hype API versus older API.
So far, with one year behind us, I can tell you that I don't regret that we didn't lose time to implement Vulkan/DX12. IMHO, we have bigger priorities such as a 64 bits port of the not-yet-ported code.
For reference
Is it me, or anvil
is a great name for a fast framework created by AMD ;)
So both AMD & Nvidia creates an extra API to add ref-counting the Vulkan structure. It was cheap to not include it on the initial spec.
Is it me, or anvil is a great name for a fast framework created by AMD ;)
Only if NVIDIA's would be called brittle.
"gregory38: If you want faster emulation, buy a better computer ;) IHMO, optimization for slow CPU is a waste of time."
For me, that's not an option with my intended uses. I'm aiming for single bord x86/x64 computers like the Lattepanda and Upboard. And other mini game systems like the GPD-Win, and the Smach-z. Boards where upgrading individual components just isn't an option. I want to see a full Emulation system the size of an n64 game cartridge.
Putting aside that:
The only vaguely meaningful optimization I see there is leveraging the fact the smach is HSA-compliant (which means zero copy is possible, which means bla bla bla). Which is something that for as much I see gregory remotely interested would require at minimum somebody to buy him the required hardware/dev-board.
Before I have a delivery man in my front door, I don't have time ;)
IMHO, HMM/HSA would only be interesting with Native resolution. It will allow to emulate the GS memory as a coherent memory (and plenty of sync issue). It would avoid all texture conversions which are really the killer in CPU/GPU perf (that why sometime the SW renderer is faster). Anyway, the future is first programmable blending.
@mirh AMD + OpenGl ... I get nightmares just thinking about that.
@lightningterror nah, it's actually pretty good providing you use the latest mesa. It's definitely getting there.
For OpenGL I see GL_ARB_bindless_texture was removed , the info about it seems it should provide a speed bump , maybe it could help amds fail drivers. Was the code really that broken ?
it should provide != it provides
The extra complexity wasn't worth it.
However, GSdx state isn't the same nowadays. We use to have 1/2 textures. We can now have 3-5 textures. Potentially my implementation was bad. Hopefully the extension will soon be implemented in Mesa so we will be able to understand how it is working.
I'm afraid that AMD driver is long road. IMHO, we need
SSO explanation. SSO allow to change the Fragment Shader (FS
) without revalidate the Vertex Shader (VS
). Feature was introduced in Dx9 (or maybe before)....
In our case, FS
is updated at a high frequency (1-5 draw calls). VS
is updated at a much lower frequency (and potentially could have been 0 if I didn't need to put ton of hack to support AMD/Intel driver). It means that AMD/Intel driver does a lot of extra validation for nothing.
By the way, the speed issue could be also a limitation of AMD architecture. Gsdx does a lots of draw call with few primitives. Modern GPU are designed to handle big number of primitives in one shot. Maybe the overhead to process a command in the GPU is bigger than the time to process the draw call. Hence the stalling of the application.
From what I read/watched(if I'm correct) how amd and how nvidia do scheduling is quite different. Nvidia does it on the driver whereas with amd you need to specify resources to what core they should go or something like that , so that leaves devs to implement it on their software instead since the amd driver is quite different than from nvidia.
If that's true then maybe multithreading needs to be added specifically for amd gpus in gsdx.
Please don't read random info from fanboy that never wrote a single line of code :) GPUs are really a complex domain.
Nvidia driver can use multiple threads for various operations. Whereas AMD is more single threaded (I'm pretty sure they use some MT but definitively less). Then you have the hardware scheduling which is unrelated and became the trending hype AKA asynchronous compute whatever... So yes AMD gives dev more possibility to dispatch the rendering command in different resources wth different priority. But there is no compute in GSdx so it is a moot point. Anyway soon Mesa driver will support MT gl, so we will be able to have nice comparison
What we need is a gl thread dispatcher. The GSdx thread will store gl command into a queue. The gl thread will read command from the queue and will execute them. This way when the gl thread is busy to execute gl command. GSdx thread can prepare the next draw (vertex/texture conversion for example)
Besides, let's not forget than the slowest and oldest (1.8ghz) C2D+nvidia was like 3x times the framerate of my 3.2 one+amd. They simply have some code that fucks up over itself, it's not just multi-thread. EDIT: I'm not sure what's the point in this issue, it's not like anybody would have to be reminded about this xD
What we need is a gl thread dispatcher.
Something like this? https://github.com/NVIDIA/libglvnd
Wtf? That dispatches calls between system and driver, not between game and driver. It has nothing to do with rendering and threads.
yes it is unrelated. The goal of glvnd is to switch gl driver at runtime instead at reboot.
What we need is a gl thread dispatcher. The GSdx thread will store gl command into a queue. The gl thread will read command from the queue and will execute them. This way when the gl thread is busy to execute gl command. GSdx thread can prepare the next draw (vertex/texture conversion for example)
aren't nvidia the only ones that have that in NV_Command_list?
No offense but people should stop to post random word. NV_Command_list records all the states into a single blob state (which can be seen as a list a of command). It is a way to achieve something closer of Vulkan/Dx12 API but with OpenGL.
Here we deals with basic multi thread approach. Instead to do
do gsdx stuff
exec gl cmd1
wait execution done
do gsdx stuff
exec gl cmd2
wait execution done
We do
do gsdx stuff
Ask your buddy to exec cmd1
do gsdx stuff
Ask your buddy to exec cmd2
And buddy will do
exec gl cmd1
wait execution done
exec gl cmd2
wait execution done
Note: Mesa threading isn't yet compatible with PCSX2. And it won't be ready for the soon to be released version.
Fwiw, I have some patches to improve Mesa threading. It really give me a nice speed boost (on blood will tell: ) even on my haswell 4Ghz. Unfortunately I found some bad stuff in Mesa so it will crash after 5-15 minutes of gameplay...
Patches to Mesa or patches to pcsx2?
If PCSX2 is going to rid itself of DX9 I think it would be better just to Rid it of DX entirely and use Vulkan since its usable in Linux and Windows plus any card that supports DX11 or 12 is sure enough to support vulkan and it would narrow down everything to one backend. Im no dev but this is just my opinion
That's not a viable option right now. AMD users are already forced to use the DX backend due to driver issues that AMD hasn't resolved. In addition to that, the time it would take to implement Vulkan versus what we would get back in performance benefits isn't worth it.
well thats what i was thinking about. Vulkan would get around AMDs dodgy GL drivers and it seems pointless to keep DX around (After) if a Vulkan backend ever gets made. If DX9 is going to be dropped in the future regardless if its far off, and we are left with GL and DX11 why not just slowly phase out DX11 as Vulkan develops since any card that supports DX11 can support Vulkan AFAIK and there would be no point in sustaining a windows only backend anymore
Also, contrarily to whatever scare they have at dolphin, (possibly because plugins perfectly modularize stuff? I dunno) we have no X renderer is a burden to Y renderer problem. Anyway, everything is up to whoever devs will want to tackle the challenge.
funny thing: if CL gets merged into Vulkan in the future, we could say we technically already have a Vulkan renderer EDIT: @gregory38 you should resend your patches I guess?
Various DX11 card won't support Vulkan. Besides
Eventually both DX11 and OpenGL renderer will/might die. But Vukcan won't solve the texture cache management. And we need advance blending. I'm not sure it is exposed in VUlkan as it requires at least a Maxwell GPU on Nvidia side. By the way, this extension will reduce the number of draw call and increase the load on the GPU. So Vulkan gain will become smaller.
What do you plan to move GSDX to when and if those go? I just hope whatever happens my RX480 will handle it. Ill upgrade to a GTX 1080 in a few years though maybe
I don't have any plan. My GPU is an "old" Kepler. I won't upgrade soon as I want a sub-75W but powerful enough GPU with free driver support.
I don't know the AMD status neither Intel one. I think recent GPU should be ok but I really don't know.
AMD has some probs with Vulkan as well (cough blending) , also it's good to have several api available. Some might have issues so it's good to have an alternative. Take intel for example. DX11 has issues on Kaby Lake , OpenGL is a mess and you might want to use DX9.
It sucks being an AMD user right now >.< just got BSOD with SilentHill4 in OGL the only SH game I haven't beaten and DX11 has an entire layer of atmosphere missing. I've heard Nvidia has issues too but I'm not aware of how bad.
> Buy an AMD card > Nuke windows and say hi to Tux > Install open sauce driver > Profit
...Anyway, please, really, it's really all up to whatever fancies a willing dev will have. And I don't know of anybody with either time or will to begin with. So please, let's stop the quite wishful thinking chatter.
Vulkan please. Replacing the existing OpenGL renderer which I hear is much slower than the existing D3D renderer with a single Vulkan renderer would help out PCSX2 a lot. While you could focus on the OpenGL renderer for both Linux and Windows, it might be easier to just pave over the old renderers with a single API and focus on it instead. Less code to maintain.
A new GUI would also really help to modernize it!
After many years Metal Gear Solid 2 intro scene still lags, a modern implementation would be welcome especially if it resolved the issue.
What is needed to create a Vulkan renderer? why not crowdfunding this project?
We would need more developers/manower. Crowdfunding is still possibility in the late future.
Would need competent vulkan implementations across the card vendors, as Gregory has pointed out.
Vulkan !=magic solution to performance issues. We would be better off with more people working on core GSdx issues than we would be with a working Vulkan backend.
If the OGL renderer is much slower than the D3D one, unless someone can fix the performance disparity, VLK is an option. Depends on what contributors are good at doing I guess. Once you have VLK going you don't have to worry about specific driver bugs like with OGL, so it seems easier to maintain in the long run, just more work up front. The RPCS3 devs sure seem to love it: https://rpcs3.net/blog/2018/01/23/rpcs3-2017-wrap-up-a-stunning-year-of-progress/
X-Y=/=Z
You might be oblivious to this, but OpenGL issues that occur in AMD drivers also often affect Vulkan too in some way.
@Swiftpaw OGL is as fast as D3D (well OGL has better vertices streaming capabilities). However AMD proprietary OGL implementation is bad. And nothing prevent AMD to release a broken Vulkan implementation too. It is sad to spend weeks of work to have a working solution for only AMD users...
Gsdx main speed issues isn't the rendering API overhead. But the emulation of the GS itself which doesn't map well to modern GPU. See my previous post for an example of what can be done to really improve the emulation.
AMD's OGL performance is probably only slower on Windows. I'm pretty sure mesa is faster by now.
I wonder if there is a chance for DX12 backend in PCSX2 it gives huge performance increase.
https://github.com/dolphin-emu/dolphin/pull/3364 https://forums.dolphin-emu.org/Thread-unofficial-dolphin-dx12-backend?page=5