gonetz / GLideN64

A new generation, open-source graphics plugin for N64 emulators.
Other
770 stars 177 forks source link

Native Resolution Factor setting causes huge performance hit #2630

Open mrfixit2001 opened 2 years ago

mrfixit2001 commented 2 years ago

I would really like to use this core, as it's clearly the most up-to-date, but on low-powered GPUs I just can't. The Gles2N64 and Rice cores run amazingly faster and seemingly use both upscaled and multisampled textured.

I opened a previous issue pointing out that gles2 multisampling is broken on my platform, and I wanted to also point out that I'm unable to use any native resolution factor except 1 without causing a HUGE performance impact. When I set it to 0 or 2+ then rendering is unplayable. The only playable setting is 1, and the textures at that point are so blocky that it just looks awful.

There must be some optimization in the resolution factor algorithm somewhere that can be done to improve performance, right? Even if it's allowing a user to set the desired values in the config instead of asking the plugin to calculate them? Sure, a bit hacky, but I'll gladly do that to improve the way the plugin performs.

dankcushions commented 2 years ago

I would really like to use this core, as it's clearly the most up-to-date, but on low-powered GPUs I just can't. The Gles2N64 and Rice cores run amazingly faster and seemingly use both upscaled and multisampled textured.

very inaccurate, not very compatible. they run better because they do much less.

When I set it to 0 or 2+ then rendering is unplayable.

when set to 0 i believe it uses your ScreenWidth and ScreenHeight for whatever resolution you want. if your device can handle 1x native (~320x240) but not 2x (~640x480), then a custom resolution somewhere in between might still be playable, and (somewhat) more detailed. mind you, i would question the desire to run at high resolutions on gles2 devices. they're just not going to be able to cope with an accurate plugin like this at anything less than the bare minimum.

but there are a lot of optimisations you can do to gliden64 on weak gpus - eg, disable hybrid filter, try without frame buffer emualation, legacy blending, EnableInaccurateTextureCoordinates, etc.

mrfixit2001 commented 2 years ago

Thanks for the quickly reply! :)

Unfortunately I have tried all those optimizations and tinkered with all the other available settings. None of them allow the game to be playable.

I do absolutely realize this plugin "does more" than the older ones with more fixes and such, but if the older plugin can emulate the same game without issues, then those enhancements aren't necessary. My device can handle 640x480 rendering for the older plugins, and for comparison I should point out it can also emulate most PSP games in ppsspp at 2x resolution.

Just to clarify - I'm using GLES2 on this device.

Hopefully there's some optimization that can be done here.

mudlord commented 2 years ago

There could be some optimizations for mobile devices. Some lessons could be learnt indeed from PPSSPP and other projects, which is exceptionally well optimized for mobile devices.

Some mobile chips, like Mali ones, for instance require all FBOs are cleared before use, since for some reason GL state can then be shared internally by Android. That comes out of the ARM Mali GL guidelines. Theres other things written up for GLES2 best practises on these devices as well as by other vendors.

fzurita commented 2 years ago

I've done profiling on many low end devices and most of the time is spent on the shader processing, that would be the place to optimize to get the most bang for the buck.

Probably the best performance you are going to get on mobile devices is to use the legacy blending, which uses less shaders, and the inaccurate texture coordinates option. @mrfixit2001 which device do you have?

LuismaSP89 commented 2 years ago

I have a pc stronger than yours and the gliden64 runs also a bit choppy in non-default resolutions, I think this can be improved by using vulkan backend. The N64 emulation is now the only one without an up-to date plugin emulation with vulkan, since the pcsx2 did it some days ago. Ported from the duckstation emulator by stenzek. All the other emulars "duckstation, ppsspp, rpcs3, pcsx2, dolphin, yuzu, cemu, etc" are working great with vulkan.

There is already an open PR about this, but we need to find first a person or groups of persons who are interested on this, since this means to do a lot of work.

oddMLan commented 2 years ago

Related? #1284

CadetSparklez commented 2 years ago

I have a pc stronger than yours and the gliden64 runs also a bit choppy in non-default resolutions, I think this can be improved by using vulkan backend. The N64 emulation is now the only one without an up-to date plugin emulation with vulkan, since the pcsx2 did it some days ago. Ported from the duckstation emulator by stenzek. All the other emulars "duckstation, ppsspp, rpcs3, pcsx2, dolphin, yuzu, cemu, etc" are working great with vulkan.

There is already an open PR about this, but we need to find first a person or groups of persons who are interested on this, since this means to do a lot of work.

theres parallel but its slower and looks worse. plus its a lot harder to get gains from vulkan the further back you go as most of its benefits are from spreading across cpu cores- and its harder to multithread things when they never had multi core to begin with etc.

plus vulkan versions often have missing features (dolphin) or bad compatibility (m64p)

this is a huge oversimplification

:)

LuismaSP89 commented 2 years ago

I have a pc stronger than yours and the gliden64 runs also a bit choppy in non-default resolutions, I think this can be improved by using vulkan backend. The N64 emulation is now the only one without an up-to date plugin emulation with vulkan, since the pcsx2 did it some days ago. Ported from the duckstation emulator by stenzek. All the other emulars "duckstation, ppsspp, rpcs3, pcsx2, dolphin, yuzu, cemu, etc" are working great with vulkan. There is already an open PR about this, but we need to find first a person or groups of persons who are interested on this, since this means to do a lot of work.

theres parallel but its slower and looks worse. plus its a lot harder to get gains from vulkan the further back you go as most of its benefits are from spreading across cpu cores- and its harder to multithread things when they never had multi core to begin with etc.

plus vulkan versions often have missing features (dolphin) or bad compatibility (m64p)

this is a huge oversimplification

:)

Yes, I know it's difficult to implement, but even with that, a vulkan backend is always on the wish list for every emulator. The thing is that the vulkan backend is a speed booster, you only need to compare pcsx2, dolphin, duckstation, rpcs3, ppsspp, yuzu, etc etc, between the vulkan and opengl backends.

Sorry, but you mention that the dolphin had missing features using vulkan, what are these features? The vulkan backend is since some years ago the main graphical backend and the most recommended to use by the devs.

mrfixit2001 commented 2 years ago

Just as a follow-up... My current settings can be seen here: https://dpaste.com/BK69DM5YF

I believe I've set everything to as "minimal" as possible to optimize the speed, and with this games do run quite well but look absolutely awful.

If I set the native resolution to False (UseNativeResolutionFactor = False) then graphics quality improves but even SM64 can't run at full speed.

My currently test device is a Rockchip RK3328, which uses a Mali 450 and GLES2.

Again - using both the Gles2Rice and Gles2N64 cores - games like SM64 and MK64 run smoothly and beautifully and WITH multisampling. The massive downside of these cores is they are unmaintained and have bugs with some games - for example Mario Golf and Rogue Squadron are completely unplayable due to missing textures (but run fast haha).

Older versions of GLideN64 did perform slightly faster, even when used as libretro cores. While I absolutely understand this core has been refactored a ton of times since then, with new features and fixes added, the point needs to be made that this low-powered hardware is clearly capable of not just emulating many games with the other cores, but emulating them with multisampling enabled.

For this open issue, Vulkan doesn't really come into play. And while I agree that would be awesome, it doesn't really apply to this request for help :)

Happy to do any additional testing on settings and patches.

fzurita commented 2 years ago

Did you try disabling frame buffer emulation? That would eliminate a lot of blit buffer copies that don't exist on the older plugins. Also, make sure you don't have overscan removal enabled since that introduces additional copies.

mrfixit2001 commented 2 years ago

Did you try disabling frame buffer emulation? That would eliminate a lot of blit buffer copies that don't exist on the older plugins. Also, make sure you don't have overscan removal enabled since that introduces additional copies.

Thanks for the quick reply! Yes I did, there is a very small performance increase but it's barely noticable. SM64 still doesn't run full speed with both UseNativeResolutionFactor and EnableFBEmulation both set to false. Additionally, with framebuffer disabled SM64 isn't sized correctly - it's height is too big and it renders half way off the screen.

fzurita commented 2 years ago

In such a slow device, all I can think of to do would be to get rid of all shader based blending. It would basically make it glN64 at that point.

mrfixit2001 commented 2 years ago

@fzurita sounds like an interesting experiment! Can you advise how I would test that?

mrfixit2001 commented 2 years ago

As an aside - I've monitored the system utilization via htop when running the core with both settings. As you can see, the board is FAR from being maxed out in any of it's specs. Which likely only leaves GPU utilization.

HTOP .

ghost commented 2 years ago

When I set it to 0 or 2+ then rendering is unplayable.

This is normal with bad GPUs. It is due to a bottleneck in the fragment processing. When rendering in a 2x scale, a triangle covers 4x as many pixels. As a result, the fragment shader has to run 4 times the number of runs in native resolution. As you raise the resolution, the workload increases quadratically. Even good GPUs will struggle if you were allowed to raise the resolution as much as you wanted.

There are three solutions to such bottleneck.

About the third possibility, I must say that magical "optimizations" that make everything work fast don't exist. Normally an optimization will give only a small boost, unless the previous code had flaws or the optimization is actually a hack that skips emulating something.

dankcushions commented 2 years ago

@mrfixit2001 CPU and memory usage are effectively irrelevant as it's the GPU and/or bandwidth that will almost always be the bottleneck on GLES devices.

the point needs to be made that this low-powered hardware is clearly capable of not just emulating many games with the other cores, but emulating them with multisampling enabled.

again, the fact that they have low compatibility and accuracy, but better performance, is connected. i am sure there is always room for optimization (for example the GLSL optimizer that was in glupen64 i think deserves another look), but even 1x native res would remain aspirational for a lot of these devices. there's just too much to do in a modern n64 video plugin. why not just use glide64mk2?

mrfixit2001 commented 2 years ago

Great feedback all around, thank you. It makes sense, ofc, that the GPU bandwidth is the concern. Just making sure to provide all the details I have for full visibility of the issue.

@standard-two-simplex - why does this bottleneck not exist in the other older cores?

@dankcushions - maybe I should ask this a different way... if a game like SM64 can be emulated extremely well using old cores without all the added features and fixes, why isn't there a codepath that can be used without them in the newer version? For example, referencing standard-two-simplex's response regarding fragment processing - if this bottleneck didn't exist in older cores, then it would greatly improve performance to allow being able to avoid that older codepath and avoid the bottleneck.

mrfixit2001 commented 2 years ago

For added clarity - PSP and Dreamcast emulation run faster at the same resolution than this core does :(

ghost commented 2 years ago

why does this bottleneck not exist in the other older cores?

Because they don't have a fragment shader or have a very simple one that skips emulating things or does it with hacks.

mrfixit2001 commented 2 years ago

Thanks again for the speedy reply... @standard-two-simplex : rice creates a fragment shader here: https://github.com/mupen64plus/mupen64plus-video-rice/blob/master/src/OGLCombiner.cpp#L258

gles2n64 creates its fragment shader here: https://github.com/ricrpi/mupen64plus-video-gles2rice/blob/master/src/OGLES2FragmentShaders.cpp

I'm confident they are far simpler than GLideN64's... but they work fine for a LOT of games, and without the bottleneck.

fzurita commented 2 years ago

For gles2n64 (glN64), most everything happens in the fixed function pipeline instead of shaders. GLideN64 did use glN64 as a starting point I believe, so you could theoretically restore some of that fixed function pipeline blending and I'm sure other stuff to GLideN64. But... at that point, you will lose all the compatibility that GLideN64 added and you might as well use glN64.

mrfixit2001 commented 2 years ago

But... at that point, you will lose all the compatibility that GLideN64 added

Does this imply that all of the compatibility enhancements that GLideN64 has added are basically inside the fragment shader? :\

fzurita commented 2 years ago

Well, some of the framebuffer handling that GLideN64 does could be ported back to glN64. That could improve compatibility.

You would lose I think mainly blending accuracy and texture coordinate based accuracy. At least that's what I would naively say, lol.

I don't think would be able take any of the LOD logic either to GLideN64 since a lot of that happens in the shader.

mrfixit2001 commented 2 years ago

Rather than porting anything backwards, I would much rather see a "legacy shader" option to accompany the legacy blending option so that we could disable the bottleneck on this upstream core.

dankcushions commented 2 years ago

why isn't there a codepath that can be used without them in the newer version?

maintenance burden. frankly l it’s a (pleasant) surprise that gles2 support is still provided at all.

For added clarity - PSP and Dreamcast emulation run faster at the same resolution than this core does :(

much simpler systems to emulate in terms of graphics. their GPUs are much closer to current tech than n64’s proto graphics accelerator, which requires complex shader work to replicate.