gonetz / GLideN64

A new generation, open-source graphics plugin for N64 emulators.
Other
769 stars 178 forks source link

Super lag in HLE mode, at least with r600/radeonsi Mesa drivers #1561

Closed Jj0YzL5nvJ closed 6 years ago

Jj0YzL5nvJ commented 7 years ago

From the moment of implementation a625225323c902b614ed9601143df3bc51550fc4, this generates some kind of delay in radeon Mesa driver (r600g).

Xubuntu 16.04.3 LTS glxinfo | grep OpenGL

OpenGL vendor string: X.Org
OpenGL renderer string: AMD JUNIPER (DRM 2.43.0 / 4.4.0-93-generic, LLVM 6.0.0)
OpenGL core profile version string: 3.3 (Core Profile) Mesa 17.3.0-devel - padoka PPA
OpenGL core profile shading language version string: 3.30
OpenGL core profile context flags: (none)
OpenGL core profile profile mask: core profile
OpenGL core profile extensions:
OpenGL version string: 3.0 Mesa 17.3.0-devel - padoka PPA
OpenGL shading language version string: 1.30
OpenGL context flags: (none)
OpenGL extensions:
OpenGL ES profile version string: OpenGL ES 3.0 Mesa 17.3.0-devel - padoka PPA
OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.00
OpenGL ES profile extensions:

The CPU consumption is not higher when compared to previous versions. And compiling with -DCRC_OPT=On changes almost nothing when it comes to FPS. Examples:

Running SM64 until the moment the white star appears:

dddb3ae1f71afa85ef782d5f5edb9661a1b1b5bd cmake -DMUPENPLUSAPI=On ../../src/ LIBGL_SHOW_FPS=1 MESA_GL_VERSION_OVERRIDE=3.3COMPAT MESA_GLSL_VERSION_OVERRIDE=420

libGL: FPS = 0.5
libGL: FPS = 40.0
libGL: FPS = 40.0
libGL: FPS = 51.0
libGL: FPS = 60.0
libGL: FPS = 4.6
libGL: FPS = 54.0
libGL: FPS = 59.0
libGL: FPS = 59.0
libGL: FPS = 60.0
libGL: FPS = 60.0
libGL: FPS = 60.0
libGL: FPS = 60.0

a625225323c902b614ed9601143df3bc51550fc4 cmake -DMUPENPLUSAPI=On ../../src/ LIBGL_SHOW_FPS=1 MESA_GL_VERSION_OVERRIDE=3.3COMPAT MESA_GLSL_VERSION_OVERRIDE=420

libGL: FPS = 0.5
libGL: FPS = 20.6
libGL: FPS = 15.0
libGL: FPS = 15.2
libGL: FPS = 15.0
libGL: FPS = 15.0
libGL: FPS = 15.2
libGL: FPS = 16.4
libGL: FPS = 16.4
libGL: FPS = 16.6
libGL: FPS = 16.4
libGL: FPS = 16.4
libGL: FPS = 4.1
libGL: FPS = 18.1
libGL: FPS = 8.2
libGL: FPS = 8.2
libGL: FPS = 8.3
libGL: FPS = 8.5
libGL: FPS = 8.2
libGL: FPS = 8.2
libGL: FPS = 8.3
libGL: FPS = 8.3
libGL: FPS = 8.2
libGL: FPS = 8.3
libGL: FPS = 8.3
libGL: FPS = 8.2
libGL: FPS = 8.2
libGL: FPS = 8.2
libGL: FPS = 8.2
libGL: FPS = 8.3
libGL: FPS = 9.0
libGL: FPS = 8.2
libGL: FPS = 8.3
libGL: FPS = 9.2
libGL: FPS = 10.7
libGL: FPS = 8.5
libGL: FPS = 8.3
libGL: FPS = 9.0
libGL: FPS = 8.2
libGL: FPS = 8.1
libGL: FPS = 8.3
libGL: FPS = 8.2
libGL: FPS = 9.0
libGL: FPS = 8.3
libGL: FPS = 8.2
libGL: FPS = 8.2
libGL: FPS = 9.0
libGL: FPS = 8.2
libGL: FPS = 8.3
libGL: FPS = 8.2
libGL: FPS = 9.0
libGL: FPS = 8.3
libGL: FPS = 8.2
libGL: FPS = 9.2
libGL: FPS = 11.0
libGL: FPS = 8.9

bbc7131655a78ae887cee481f0a67674385fc2d2 cmake -DCRC_OPT=On -DMUPENPLUSAPI=On ../../src/ LIBGL_SHOW_FPS=1

libGL: FPS = 0.3
libGL: FPS = 21.8
libGL: FPS = 16.3
libGL: FPS = 16.4
libGL: FPS = 16.5
libGL: FPS = 16.6
libGL: FPS = 16.3
libGL: FPS = 16.6
libGL: FPS = 16.5
libGL: FPS = 16.4
libGL: FPS = 16.4
libGL: FPS = 6.9
libGL: FPS = 18.4
libGL: FPS = 8.3
libGL: FPS = 8.4
libGL: FPS = 8.4
libGL: FPS = 8.3
libGL: FPS = 8.4
libGL: FPS = 8.2
libGL: FPS = 8.4
libGL: FPS = 8.2
libGL: FPS = 8.3
libGL: FPS = 8.3
libGL: FPS = 8.4
libGL: FPS = 8.3
libGL: FPS = 8.3
libGL: FPS = 8.3
libGL: FPS = 8.3
libGL: FPS = 8.3
libGL: FPS = 9.2
libGL: FPS = 8.2
libGL: FPS = 8.3
libGL: FPS = 9.0
libGL: FPS = 10.7
libGL: FPS = 8.3
libGL: FPS = 8.3
libGL: FPS = 9.1
libGL: FPS = 8.2
libGL: FPS = 8.2
libGL: FPS = 8.3
libGL: FPS = 8.3
libGL: FPS = 9.1
libGL: FPS = 8.3
libGL: FPS = 8.3
libGL: FPS = 8.3
libGL: FPS = 9.0
libGL: FPS = 8.2
libGL: FPS = 8.3
libGL: FPS = 8.2
libGL: FPS = 9.3
libGL: FPS = 8.2
libGL: FPS = 8.3
libGL: FPS = 9.2
libGL: FPS = 10.6

If anyone knows how to put spoilers, let me know how.

loganmc10 commented 7 years ago

Can you post the full output of glxinfo? Maybe put it on pastebin or upload a txt file or something so it's not a huge post

Jj0YzL5nvJ commented 7 years ago

http://sprunge.us/DehV

Jj0YzL5nvJ commented 7 years ago

I've been doing more testing and apparently VBO just makes the problem even more evident, so this problem reside in other place.

Just as note, the use of the GPU is almost nil in lag times and CPU usage does not exceed 33%. Any mesa-utils uses more GPU... (tested with radeontop). But if LIBGL_ALWAYS_SOFTWARE=1 is used, the CPU usage is stupidly high (89% average).

loganmc10 commented 7 years ago

Do you have anything non-default set in the config? If so, can you post your mupen64plus.cfg?

Jj0YzL5nvJ commented 7 years ago

http://sprunge.us/cAII

I modify VideoPlugin and RspPlugin constantly, delete everything related to Video-GLideN64 every time I change the version. In theory I'm always using defaults, the only custom configuration that I remember doing is in Input-SDL-Control1. I have been finding multiple bugs in mupen64plus itself, so I'm going to have to do tests with old versions too.

Edit: I even tested with modesetting driver and r600g with EXA acceleration (DRI2), all the same, nothing changed.

Jj0YzL5nvJ commented 7 years ago

I found the origin! 313741d8276f0e018ba0c137892d67794468cc28

In my tests, that comment causes much lag in: Harvest Moon 64 Perfect Dark Paper Mario

And partial lag (only in certain scenes) in: Space Station Silicon Valley Super Smash Bros. Bomberman 64 Donkey Kong 64 Kirby 64: The Crystal Shards

Further comments increase the lag in other games, but a625225323c902b614ed9601143df3bc51550fc4 generalizes the lag in all games, "it was the straw that broke the camel's back". At least in my hardware, I deduce.

loganmc10 commented 7 years ago

It seems like your machine has an issue with gl_arb_buffer_storage

Using the latest master, can you try forcing this variable to false:

https://github.com/gonetz/GLideN64/blob/master/src/Graphics/OpenGLContext/opengl_GLInfo.cpp#L58-L59

Just get rid of that whole statement and replace it with bufferStorage = false

Jj0YzL5nvJ commented 7 years ago

Nope, not is it. With that change the lag is slightly worse (between 0.1 to 1.2 FPS less in SM64).

Jj0YzL5nvJ commented 7 years ago

I think I found something interesting. See this first: https://imgur.com/a/rEMEu

In some parts of Carrington Institute, the lag disappears completely (No. 1, No. 6 and No. 7). Usually, places with poor lighting or being very close to any wall, lit up or not. But are some exceptions (No. 2, No. 4 and No. 5), the FPS drop relatively little when compared to most of the other lights (No. 3).

In the case of image No. 7 and No. 8, both are exactly the same spot. The only difference is that No. 8 was taken after activating Hi-Res. With Hi-Res enabled, the FPS are maintained very similar to No. 8 in places where they were previously perfect.

Save state: https://0x0.st/RpV.zip Current cfg: http://sprunge.us/Pdia gliden64.log: http://sprunge.us/MNCH

P.S: The gliden64.log only is generated when GLideN64 is compiled with Clang...

Jj0YzL5nvJ commented 6 years ago

I've been "playing a lot", updating and recompiling many dependencies in my PC... After doing test after test, in different games and messing with the configuration file. I found some mitigators for the symptoms, unfortunately none such universal solution.

The first one is use cxd4-sse2 in "HLE mode" instead of the HLE plugin. The second one is use DisableFBInfo = False or/and EnableCopyColorToRDRAM = 0. #1559 The last one is useing LIBGL_ALWAYS_SOFTWARE=1. But in many games this will become an backfire.

Examples: https://imgur.com/a/HAqJZ

Edit: Added more images differentiating LLE and HLE in cxd4-sse2. It's like if GLideN64 forces an LLE mode into "mupen64plus-rsp-hle.so".

Jj0YzL5nvJ commented 6 years ago

I've been testing angrylion-plus... and I have noticed that in my previous post I confused HLE and LLE configuration in cxd4-sse2 completely (thanks to this). So the images description are inverted... cxd4-sse2 (HLE) in reality is cxd4-sse2 (LLE) and vice versa.

My lag problem in HLE persists ... only now I know that cxd4 is also affected, Even angrylion-plus works for me faster than GLideN64 in HLE mode (DisplayListToGraphicsPlugin = True). I gonna stick to bc00985f33c8b1e2e63cebabe53d01f9cf3708a6, the last commit that works well for me u.u

fzurita commented 6 years ago

So the one that made things slow for you is this?

313741d8276f0e018ba0c137892d67794468cc28

Can you try disabling Copy color buffer to RDRAM and check if performance improves? It sounds like AMD VESA drivers don't like buffer storage either.

Edit: While you are at it, can you test this branch? https://github.com/fzurita/GLideN64/tree/threaded_GLideN64

I'm curious how threaded GLideN64 performs with AMD hardware.

loganmc10 commented 6 years ago

Can you try disabling Copy color buffer to RDRAM and check if performance improves? It sounds like AMD VESA drivers don't like buffer storage either.

I had him try disabling buffer storage (https://github.com/gonetz/GLideN64/issues/1561#issuecomment-326972673) it didn't seem to make a difference

fzurita commented 6 years ago

Let's double check by turning off "Copy color buffer to RDRAM".

Jj0YzL5nvJ commented 6 years ago

The second one is use DisableFBInfo = False or/and EnableCopyColorToRDRAM = 0.

Disabling those two things is the only thing that helped me a bit. But I don't remember trying to disable that with the modification suggested by loganmc10 (https://github.com/gonetz/GLideN64/issues/1561#issuecomment-326972673), I'll try again.

Edit: While you are at it, can you test this branch? https://github.com/fzurita/GLideN64/tree/threaded_GLideN64

Okay, I don't have much to do anyway. I'll try it when I'm at home.

Jj0YzL5nvJ commented 6 years ago

No. I don't see much difference between disabling bufferStorage or by using EnableCopyColorToRDRAM = 0 in the cfg file. Or by using the two, at most there will be an average difference of 1 FPS, 2 VI/S and 3 % in the counters (in Banjo-Kazooie). At least with HLE.

@fzurita, the same history is with thethreaded_GLideN64 branch, I can test in LLE tomorrow.

fzurita commented 6 years ago

So this is very odd. The commit that made things slow for you is https://github.com/gonetz/GLideN64/commit/313741d8276f0e018ba0c137892d67794468cc28 but that code is only invoked when EnableCopyColorToRDRAM is not zero.

I'm not sure what is going on.

Could it be that https://github.com/gonetz/GLideN64/blob/master/ini/GLideN64.custom.ini is overwriting your setting?

Jj0YzL5nvJ commented 6 years ago

I've been testing in my free time (which is very little) and I did some discoveries. Unfortunately, nothing directly related to my problem... apparently.

Could it be that https://github.com/gonetz/GLideN64/blob/master/ini/GLideN64.custom.ini is overwriting your setting?

I really doubt it. I don't use any GLideN64*.ini files. Or more precisely, I don't know where to put them to make them work with mupen64plus. And I can see the effects caused after edit my configuration file.

In respect to 313741d8276f0e018ba0c137892d67794468cc28 see this first https://github.com/gonetz/GLideN64/issues/1561#issuecomment-326910677 In addition to the ROM's already mentioned, I also tested this others without detect lag problems: Banjo-Kazooie, GoldenEye 007, The Legend of Zelda (OoT and MM), Mario Kart 64, Resident Evil 2, Super Mario 64, the both Castlevania's and Conker's Bad Fur Day Something worth to mention is that I only test the intros without interaction at that time. Recently it was tried again and in the case of Perfect Dark, the lag only appears in "the first boot". The lag only appear in the logos animation (RARE, Nintendo, N64 and Perfect Dark), in the second round the same logos move smoothly, the gameplay too. Interestingly, if I left START pressed in the game's boot time, when leaving the Controller Pak menu, the lag never occurs.

I know very little of programming (Turbo C, Turbo Pascal, Visual FoxPro, Delphi, etc.) and I'm more used to writing maintenance scripts (Batch, VBScript, AHK, etc.) for Windows. This seems very AMD specific. So, I don't know if this is worth to refer to this, sorry if not.

I split the VRAM into 8×8 blocks. All hazards and dependencies are tracked at this level. I chose 8×8 simply because it fits neatly into 64 threads on blits (wavefront size on AMD), and the smallest texture window is 8 pixels large, nice little coincidence 🙂

https://www.libretro.com/index.php/introducing-vulkan-psx-renderer-for-beetlemednafen-psx/

In my last test I seen lag and freezing times in bc00985f33c8b1e2e63cebabe53d01f9cf3708a6 when GLideN64 try to fill more of 542M in VRAM. After that the VRAM usage reduces a few MB and start to fill again. In 313741d8276f0e018ba0c137892d67794468cc28 this never occur, but is very difficult to fill more of 120M of VRAM, is like the code is doing more cleaning VRAM than using it. I had to destroy many things in Perfect Dark to achieve to fill the VRAM and surpass the erasing VRAM code. But again this never use more of 542M or the 60% of VRAM (max 1024M).

Edit: Forget to mention that I found specific places in Banjo-Kazooie which the lag is more intense with LLE than in HLE (with https://github.com/gonetz/GLideN64/commit/deaf61299f12168b775d7e2d448b9eca149c0e7e and 3cf7377 threaded_GLideN64). So this not HLE specific, but is much more notorious in it.

Jj0YzL5nvJ commented 6 years ago

@fzurita, I get this using your 'further_reduce_shader_logic' branch: https://0x0.st/sX2R.log In my case https://github.com/gonetz/GLideN64/issues/1665#issuecomment-351760214 doesn’t make any difference.

Jj0YzL5nvJ commented 6 years ago

I have found the true origins (I believe). 313741d8276f0e018ba0c137892d67794468cc28 and 3aa365d24a6bd5f49e1205ea6338cddd7092e6f4

As I mentioned before, 313741d8276f0e018ba0c137892d67794468cc28 cause lag (and glitches) to me on the boot time and the first scenes of Perfect Dark and others games https://github.com/gonetz/GLideN64/issues/1561#issuecomment-326910677. But not in the gameplay.

On the other hand, 3aa365d24a6bd5f49e1205ea6338cddd7092e6f4 fixes the glitches caused by 313741d8276f0e018ba0c137892d67794468cc28, but the lag become permanent in Perfect Dark, in HLE with Hi-Res enabled. With Hi-Res disabled, the FPS only become unstable. So the problem was very difficult to find... because Hi-Res enabled does not offer any benefit in emulation.

And I quote myself:

Further comments increase the lag in other games, but a625225323c902b614ed9601143df3bc51550fc4 generalizes the lag in all games, "it was the straw that broke the camel's back". At least in my hardware, I deduce.

fzurita commented 6 years ago

Do you have color buffer to RDRAM enabled? If you don't have it enabled, the first commit shouldn't make a difference.

Jj0YzL5nvJ commented 6 years ago

I tested with the default value EnableCopyColorToRDRAM = 2. Yeah, with EnableCopyColorToRDRAM = 0, neither 313741d8276f0e018ba0c137892d67794468cc28 nor 3aa365d24a6bd5f49e1205ea6338cddd7092e6f4 manifests lag.

Now I have to find the comment that makes the lag manifest with EnableCopyColorToRDRAM = 0, I suppose. u.u

fzurita commented 6 years ago

Well before those commits, EnableCopyColorToRDRAM was always disabled for GLES 2.0 devices.

Edit: whoops, you have a AMD device. Too many issues and got things confused. It seems like buffer storage causes slow downs with AMD. At least with that specific driver.

Jj0YzL5nvJ commented 6 years ago

Ouroboros! Back to the beginning T.T

Starting from a625225323c902b614ed9601143df3bc51550fc4, the lag persist even with EnableCopyColorToRDRAM = 0. And it also affects LLE, but with less severity than in HLE.

Hmm... I'm on vacation, I will try to create a dual-boot installation with Windows and another distribution for testing. ...Just thinking about it, I lose all desire to do it.

fzurita commented 6 years ago

Same thing. That commit added more buffer storage usage. We enable buffer storage automatically when the driver reports that it supports it. We could make a special case against that specific driver to disable buffer storage utilization.

Jj0YzL5nvJ commented 6 years ago

Lets summarize the few workarounds that can be made to get GLideN64 working on r600/radeonsi Mesa drivers. From better to worse...

That's all.

This is going to take some time ...I hope.

fzurita commented 6 years ago

Can you try one more thing? In here: https://github.com/gonetz/GLideN64/blob/master/src/Graphics/OpenGLContext/opengl_GLInfo.cpp#L61-L62

Replace:

    bufferStorage = (!isGLESX && (numericVersion >= 44)) || Utils::isExtensionSupported(*this, "GL_ARB_buffer_storage") ||
            Utils::isExtensionSupported(*this, "GL_EXT_buffer_storage");

with

    bufferStorage = false;

Also, this: https://github.com/gonetz/GLideN64/blob/master/src/Graphics/OpenGLContext/opengl_ContextImpl.cpp#L62-L67

Replace those lines with this:

m_graphicsDrawer.reset(new UnbufferedDrawer(m_glInfo, m_cachedFunctions->getCachedVertexAttribArray()));
Jj0YzL5nvJ commented 6 years ago

@fzurita Eureka! That do the trick! I need to use MESA_GL_VERSION_OVERRIDE=3.3COMPAT, but that doesn't matter. Now we have a real workaround.

Thanks so much, really ^^

AaronBPaden commented 6 years ago

Note for on a slightly newer GCN card, this helps for me but is only a partial fix.

For the Ocarina of Time intro, when link first appears riding on Epona, without the patch performance drops from ~50fps to about 15. It's pretty consistently within 1 FPS of 15 in this case.

With the patch, performance drops from ~60 FPS to somewhere between 20-40. It's less consistent here, and there are places (generally when there are no characters on the screen) where it runs at full speed, but I'd say average with characters on the screen is about 30 FPS. That's probably closer to a real n64 than 60 FPS, but the problem is the constant stuttering. :P

Also with the patch, I have to use 3.3COMPAT or I get a black screen. I was going to make an apitrace, but it seems that tool doesn't work with compatibility profiles, unless I'm missing something.

Probably there are multiple issues with GlideN64 and Mesa on older AMD hardware, including this one.

fzurita commented 6 years ago

It seems that mesa driver has poor support for some features. Even after disabling buffer storage and VBOs, there is still something it doesn't like.

AaronBPaden commented 6 years ago

GLideN64 is definitely hitting a slow path in mesa that doesn't exist in the Windows drivers, where the plugin works fine. At the same time, I think GLideN64 must be doing something unusual (not unexpected for an emulator) because I'm not seeing these symptoms in any other GL application.

loganmc10 commented 6 years ago

The unusual thing GLideN64 does with VBO's is called Buffer Object Streaming (https://www.khronos.org/opengl/wiki/Buffer_Object_Streaming)

It works well on my Intel Mesa laptop, but I don't have any AMD devices to test.

Mesa only officially supports the core profile, and the core profile requires the use of VBO's, which is why this is needed

Jj0YzL5nvJ commented 6 years ago

I've been quick testing some games. Until now I spotted three lag points, one in Bomberman 64 and two in GoldenEye 007. I'm gonna bisect and apply "the patch" to discern the other "bad commit". That when finish my work, of course.

Lithium64 commented 6 years ago

Same issue here, I'm running latest m64p linux build (ubuntu build don't works correctly here).

My specifications/settings are

i3 4150 3.50ghz 12 GB RAM Radeon R7 260X 2GB VRAM Ubuntu 17.10 Kernel 4.14.15 Mesa 17.3.2

Jj0YzL5nvJ commented 6 years ago

I made a small "benchmark" using GALLIUM_HUD, testing some branches and pull requests. All my tests include the changes to disable buffer storage. @fzurita, @loganmc10 and for the few people who share my problem. I hope this at least can be useful for you.

Benchmark 01 Savestate: ge007_1.zip

Benchmark 02 Savestate: ge007_0.zip

Benchmark 03

Benchmark 04 Savestate: bomberman64.zip

P.S: For adapt https://github.com/gonetz/GLideN64/issues/1561#issuecomment-360246650 to threaded_GLideN64.

Replace this: https://github.com/fzurita/GLideN64/blob/threaded_GLideN64/src/Graphics/OpenGLContext/opengl_ContextImpl.cpp#L65-L71

For this:

        if (config.video.threadedVideo)
            m_graphicsDrawer.reset(new UnbufferedDrawerThreadSafe(m_glInfo, m_cachedFunctions->getCachedVertexAttribArray()));
        else
            m_graphicsDrawer.reset(new UnbufferedDrawer(m_glInfo, m_cachedFunctions->getCachedVertexAttribArray()));
loganmc10 commented 6 years ago

I have a few ideas for this, I'll create some test commits tomorrow for you to try out. Just to confirm, you say that LLE mode is often faster than HLE is that correct?

Jj0YzL5nvJ commented 6 years ago

Just to confirm, you say that LLE mode is often faster than HLE is that correct?

Only with buffer storage enabled. If buffer storage is forcefully disabled, HLE is faster.

Edit: This is how HLE mode performs with 'master' on my hardware

All previous benchmarks with the exception of 03, are laggy places that even persist with buffer storage disabled. But with buffer storage enabled, such places are even most affected by lag. How ironic, don't make any sense.

loganmc10 commented 6 years ago

Maybe we'll have to chat on some other platform, I thought you previously said disabling buffer storage didn't make a difference?

fzurita commented 6 years ago

To make any difference, he had to disable both buffer storage and VBOs.

loganmc10 commented 6 years ago

@Jj0YzL5nvJ This is going to seem like a silly shot in the dark, but can you try to apply this commit:

https://github.com/loganmc10/GLideN64/commit/881f58df4a5fdf539da5b8449d81c999e86ec0a4

Leave everything else like it is in the current master, and I would set EnableCopyColorToRDRAM = 0 for now in the config, just so we can work on one issue at a time.

From what I read, older ATI cards required the vertex data to be aligned to 32/64 bytes for best performance. Ours is currently at 44 bytes, so the padding brings it up to 64. I'm really not sure if it'll make a big difference, but it's worth a shot

loganmc10 commented 6 years ago

Here is another test: https://github.com/loganmc10/GLideN64/commit/af5d5b8a1878d2e87a6a78dee25a0fd86b745266

Same rules as before, please leave everything else the same as master, and set EnableCopyColorToRDRAM = 0. This is testing a different way to handle the buffer storage

loganmc10 commented 6 years ago

@fzurita I'm curious if https://github.com/loganmc10/GLideN64/commit/881f58df4a5fdf539da5b8449d81c999e86ec0a4 makes any difference on Adreno devices with GLES.3.2 (they'll support VBO + buffer storage). Some stuff I read says the vertex data alignment matters on some mobile chipsets but it's hard to find solid information

fzurita commented 6 years ago

I can do some quick testing on a slower Adreno device.

AaronBPaden commented 6 years ago

Neither patch appears to effect my ~1.0 GCN card.

I mentioned earlier that I was interested in running this in a profiler. Caveat: I have no idea what I'm doing.

Using the apitrace protocol, it looks like gl calls are using no more than ~2 ms in the gpu.

In the CPU, however, the graph is skewed because some calls to glTexSubImage2D is occasionally take around 100 ms or more (!!). However, this happening enough for it to be the problem on a frame-by-frame basis. I am seeing calls to glDrawElementsBaseVertex taking ~15ms. When I zoom in the graph looks like this.

screenshot from 2018-02-25 23-47-18

The pattern here looks like several calls to glDrawElementsBaseVertex taking around 10-15ms, followed by a call to glFlushMappedBufferRange also taking about 10ms.

Is there anything anyone would be interested in me looking at?

Jj0YzL5nvJ commented 6 years ago

@loganmc10, my tests without disabling buffer storage and VBOs, take them all with a grain of salt. GL_ARB_buffer_storage is certainly broken in my drivers.

Test with EnableCopyColorToRDRAM = 0

Personally I did not notice any significant changes, much less by disabling buffer storage and VBOs. But comparing results with the previous test, the differences are very significant, especially in the CPU activity, GPU activity and buffer wait time.

@BPaden, try to run using MESA_GL_VERSION_OVERRIDE=3.3COMPAT MESA_GLSL_VERSION_OVERRIDE=410 MESA_EXTENSION_OVERRIDE="-GL_ARB_buffer_storage"

Can you put the results of the following commands?

glxinfo | grep OpenGL
cat /var/log/Xorg.0.log | grep -i enabled
cat /var/log/Xorg.0.log | grep -i load
cat /var/log/Xorg.0.log | grep -i swap
loganmc10 commented 6 years ago

@BPaden that trace is very helpful. My next test will be replacing glDrawElementsBaseVertex with glDrawArrays. I assumed this might be an issue since @Jj0YzL5nvJ mentioned that LLE works better, I believe LLE always uses glDrawArrays (I could be remembering wrong though).

Disabling VBO's is good for testing, but it can't be a long-term solution. Core OpenGL requires the use of VBO's (that's why you need the environment variable to get it to work). In a future version of Mesa, they could remove support altogether for non-VBO rendering if they wanted, so we can't count on that.

@BPaden the long glTexSubImage2D is unfortunate but not unexpected. That is when the emulator is uploading texture data to the GPU. In a normal game you would do that at the beginning, not during rendering, but the emulator doesn't know about the texture data until right before it's needed, so we have to upload it like that

loganmc10 commented 6 years ago

Ok @BPaden @Jj0YzL5nvJ can you try this commit:

https://github.com/loganmc10/GLideN64/commit/9bcfa67d9550c7f1cd4ba72f657facd66a4d27e4

I tested this on my Nvidia laptop and saw no difference in performance, but it may make a difference for you. I'm also curious if this makes any difference on Adreno devices with buffer storage @fzurita

AaronBPaden commented 6 years ago

libGL: FPS = 60.9 libGL: FPS = 59.9 libGL: FPS = 59.9 libGL: FPS = 59.9 libGL: FPS = 56.9 libGL: FPS = 60.9 libGL: FPS = 59.9 libGL: FPS = 59.9 libGL: FPS = 59.9 libGL: FPS = 59.9 libGL: FPS = 59.9 libGL: FPS = 59.9 libGL: FPS = 59.9 libGL: FPS = 59.9 libGL: FPS = 60.9 libGL: FPS = 59.9 libGL: FPS = 59.9 libGL: FPS = 59.9 libGL: FPS = 59.9 libGL: FPS = 59.9 libGL: FPS = 59.9 libGL: FPS = 59.9 libGL: FPS = 59.9 libGL: FPS = 60.9 libGL: FPS = 59.9 libGL: FPS = 59.9 libGL: FPS = 59.9 libGL: FPS = 59.9 libGL: FPS = 59.9 libGL: FPS = 59.9 libGL: FPS = 59.9 libGL: FPS = 59.9 libGL: FPS = 60.9 libGL: FPS = 59.9 libGL: FPS = 59.9 libGL: FPS = 59.9 libGL: FPS = 59.9 libGL: FPS = 59.9 libGL: FPS = 59.9 libGL: FPS = 59.9 libGL: FPS = 59.9 libGL: FPS = 60.9 libGL: FPS = 59.9 libGL: FPS = 59.9 libGL: FPS = 59.9

😁

fzurita commented 6 years ago

Sure, I'll try that. It will actually probably help performance with slower Android devices. I do know that VBOs and EBOs are slower with them. I remember in the past they were about 10% slower.

loganmc10 commented 6 years ago

Yeah well it definitely looks like we found the bottleneck in the Mesa driver. I'm going to hop on their IRC channel and ask about this.

It's a little counterintuitive, the whole point of the elements (glDrawElements), is that you can reduce the amount of bandwidth used in uploading the vertex data. But maybe when used in conjunction with VBO streaming the benefits are negated, I'll be curious to hear if there is any difference on a mobile device.