Closed Jj0YzL5nvJ closed 6 years ago
Can you post the full output of glxinfo? Maybe put it on pastebin or upload a txt file or something so it's not a huge post
I've been doing more testing and apparently VBO just makes the problem even more evident, so this problem reside in other place.
Just as note, the use of the GPU is almost nil in lag times and CPU usage does not exceed 33%. Any mesa-utils uses more GPU... (tested with radeontop). But if LIBGL_ALWAYS_SOFTWARE=1
is used, the CPU usage is stupidly high (89% average).
Do you have anything non-default set in the config? If so, can you post your mupen64plus.cfg?
I modify VideoPlugin
and RspPlugin
constantly, delete everything related to Video-GLideN64
every time I change the version.
In theory I'm always using defaults, the only custom configuration that I remember doing is in Input-SDL-Control1
.
I have been finding multiple bugs in mupen64plus itself, so I'm going to have to do tests with old versions too.
Edit: I even tested with modesetting driver and r600g with EXA acceleration (DRI2), all the same, nothing changed.
I found the origin! 313741d8276f0e018ba0c137892d67794468cc28
In my tests, that comment causes much lag in: Harvest Moon 64 Perfect Dark Paper Mario
And partial lag (only in certain scenes) in: Space Station Silicon Valley Super Smash Bros. Bomberman 64 Donkey Kong 64 Kirby 64: The Crystal Shards
Further comments increase the lag in other games, but a625225323c902b614ed9601143df3bc51550fc4 generalizes the lag in all games, "it was the straw that broke the camel's back". At least in my hardware, I deduce.
It seems like your machine has an issue with gl_arb_buffer_storage
Using the latest master, can you try forcing this variable to false:
https://github.com/gonetz/GLideN64/blob/master/src/Graphics/OpenGLContext/opengl_GLInfo.cpp#L58-L59
Just get rid of that whole statement and replace it with bufferStorage = false
Nope, not is it. With that change the lag is slightly worse (between 0.1 to 1.2 FPS less in SM64).
I think I found something interesting. See this first: https://imgur.com/a/rEMEu
In some parts of Carrington Institute, the lag disappears completely (No. 1, No. 6 and No. 7). Usually, places with poor lighting or being very close to any wall, lit up or not. But are some exceptions (No. 2, No. 4 and No. 5), the FPS drop relatively little when compared to most of the other lights (No. 3).
In the case of image No. 7 and No. 8, both are exactly the same spot. The only difference is that No. 8 was taken after activating Hi-Res. With Hi-Res enabled, the FPS are maintained very similar to No. 8 in places where they were previously perfect.
Save state: https://0x0.st/RpV.zip Current cfg: http://sprunge.us/Pdia gliden64.log: http://sprunge.us/MNCH
P.S: The gliden64.log only is generated when GLideN64 is compiled with Clang...
I've been "playing a lot", updating and recompiling many dependencies in my PC... After doing test after test, in different games and messing with the configuration file. I found some mitigators for the symptoms, unfortunately none such universal solution.
The first one is use cxd4-sse2 in "HLE mode" instead of the HLE plugin.
The second one is use DisableFBInfo = False
or/and EnableCopyColorToRDRAM = 0
. #1559
The last one is useing LIBGL_ALWAYS_SOFTWARE=1
. But in many games this will become an backfire.
Examples: https://imgur.com/a/HAqJZ
Edit: Added more images differentiating LLE and HLE in cxd4-sse2. It's like if GLideN64 forces an LLE mode into "mupen64plus-rsp-hle.so".
I've been testing angrylion-plus... and I have noticed that in my previous post I confused HLE and LLE configuration in cxd4-sse2 completely (thanks to this). So the images description are inverted... cxd4-sse2 (HLE) in reality is cxd4-sse2 (LLE) and vice versa.
My lag problem in HLE persists ... only now I know that cxd4 is also affected, Even angrylion-plus works for me faster than GLideN64 in HLE mode (DisplayListToGraphicsPlugin = True
).
I gonna stick to bc00985f33c8b1e2e63cebabe53d01f9cf3708a6, the last commit that works well for me u.u
So the one that made things slow for you is this?
313741d8276f0e018ba0c137892d67794468cc28
Can you try disabling Copy color buffer to RDRAM and check if performance improves? It sounds like AMD VESA drivers don't like buffer storage either.
Edit: While you are at it, can you test this branch? https://github.com/fzurita/GLideN64/tree/threaded_GLideN64
I'm curious how threaded GLideN64 performs with AMD hardware.
Can you try disabling Copy color buffer to RDRAM and check if performance improves? It sounds like AMD VESA drivers don't like buffer storage either.
I had him try disabling buffer storage (https://github.com/gonetz/GLideN64/issues/1561#issuecomment-326972673) it didn't seem to make a difference
Let's double check by turning off "Copy color buffer to RDRAM".
The second one is use
DisableFBInfo = False
or/andEnableCopyColorToRDRAM = 0
.
Disabling those two things is the only thing that helped me a bit. But I don't remember trying to disable that with the modification suggested by loganmc10 (https://github.com/gonetz/GLideN64/issues/1561#issuecomment-326972673), I'll try again.
Edit: While you are at it, can you test this branch? https://github.com/fzurita/GLideN64/tree/threaded_GLideN64
Okay, I don't have much to do anyway. I'll try it when I'm at home.
No. I don't see much difference between disabling bufferStorage
or by using EnableCopyColorToRDRAM = 0
in the cfg file. Or by using the two, at most there will be an average difference of 1 FPS, 2 VI/S and 3 % in the counters (in Banjo-Kazooie). At least with HLE.
@fzurita, the same history is with thethreaded_GLideN64
branch, I can test in LLE tomorrow.
So this is very odd. The commit that made things slow for you is https://github.com/gonetz/GLideN64/commit/313741d8276f0e018ba0c137892d67794468cc28 but that code is only invoked when EnableCopyColorToRDRAM
is not zero.
I'm not sure what is going on.
Could it be that https://github.com/gonetz/GLideN64/blob/master/ini/GLideN64.custom.ini is overwriting your setting?
I've been testing in my free time (which is very little) and I did some discoveries. Unfortunately, nothing directly related to my problem... apparently.
Could it be that https://github.com/gonetz/GLideN64/blob/master/ini/GLideN64.custom.ini is overwriting your setting?
I really doubt it. I don't use any GLideN64*.ini files. Or more precisely, I don't know where to put them to make them work with mupen64plus. And I can see the effects caused after edit my configuration file.
In respect to 313741d8276f0e018ba0c137892d67794468cc28 see this first https://github.com/gonetz/GLideN64/issues/1561#issuecomment-326910677 In addition to the ROM's already mentioned, I also tested this others without detect lag problems: Banjo-Kazooie, GoldenEye 007, The Legend of Zelda (OoT and MM), Mario Kart 64, Resident Evil 2, Super Mario 64, the both Castlevania's and Conker's Bad Fur Day Something worth to mention is that I only test the intros without interaction at that time. Recently it was tried again and in the case of Perfect Dark, the lag only appears in "the first boot". The lag only appear in the logos animation (RARE, Nintendo, N64 and Perfect Dark), in the second round the same logos move smoothly, the gameplay too. Interestingly, if I left START pressed in the game's boot time, when leaving the Controller Pak menu, the lag never occurs.
I know very little of programming (Turbo C, Turbo Pascal, Visual FoxPro, Delphi, etc.) and I'm more used to writing maintenance scripts (Batch, VBScript, AHK, etc.) for Windows. This seems very AMD specific. So, I don't know if this is worth to refer to this, sorry if not.
I split the VRAM into 8×8 blocks. All hazards and dependencies are tracked at this level. I chose 8×8 simply because it fits neatly into 64 threads on blits (wavefront size on AMD), and the smallest texture window is 8 pixels large, nice little coincidence 🙂
https://www.libretro.com/index.php/introducing-vulkan-psx-renderer-for-beetlemednafen-psx/
In my last test I seen lag and freezing times in bc00985f33c8b1e2e63cebabe53d01f9cf3708a6 when GLideN64 try to fill more of 542M in VRAM. After that the VRAM usage reduces a few MB and start to fill again. In 313741d8276f0e018ba0c137892d67794468cc28 this never occur, but is very difficult to fill more of 120M of VRAM, is like the code is doing more cleaning VRAM than using it. I had to destroy many things in Perfect Dark to achieve to fill the VRAM and surpass the erasing VRAM code. But again this never use more of 542M or the 60% of VRAM (max 1024M).
Edit:
Forget to mention that I found specific places in Banjo-Kazooie which the lag is more intense with LLE than in HLE (with https://github.com/gonetz/GLideN64/commit/deaf61299f12168b775d7e2d448b9eca149c0e7e and 3cf7377 threaded_GLideN64
). So this not HLE specific, but is much more notorious in it.
@fzurita, I get this using your 'further_reduce_shader_logic' branch: https://0x0.st/sX2R.log In my case https://github.com/gonetz/GLideN64/issues/1665#issuecomment-351760214 doesn’t make any difference.
I have found the true origins (I believe). 313741d8276f0e018ba0c137892d67794468cc28 and 3aa365d24a6bd5f49e1205ea6338cddd7092e6f4
As I mentioned before, 313741d8276f0e018ba0c137892d67794468cc28 cause lag (and glitches) to me on the boot time and the first scenes of Perfect Dark and others games https://github.com/gonetz/GLideN64/issues/1561#issuecomment-326910677. But not in the gameplay.
On the other hand, 3aa365d24a6bd5f49e1205ea6338cddd7092e6f4 fixes the glitches caused by 313741d8276f0e018ba0c137892d67794468cc28, but the lag become permanent in Perfect Dark, in HLE with Hi-Res enabled. With Hi-Res disabled, the FPS only become unstable. So the problem was very difficult to find... because Hi-Res enabled does not offer any benefit in emulation.
And I quote myself:
Further comments increase the lag in other games, but a625225323c902b614ed9601143df3bc51550fc4 generalizes the lag in all games, "it was the straw that broke the camel's back". At least in my hardware, I deduce.
Do you have color buffer to RDRAM enabled? If you don't have it enabled, the first commit shouldn't make a difference.
I tested with the default value EnableCopyColorToRDRAM = 2
.
Yeah, with EnableCopyColorToRDRAM = 0
, neither 313741d8276f0e018ba0c137892d67794468cc28 nor 3aa365d24a6bd5f49e1205ea6338cddd7092e6f4 manifests lag.
Now I have to find the comment that makes the lag manifest with EnableCopyColorToRDRAM = 0
, I suppose. u.u
Well before those commits, EnableCopyColorToRDRAM was always disabled for GLES 2.0 devices.
Edit: whoops, you have a AMD device. Too many issues and got things confused. It seems like buffer storage causes slow downs with AMD. At least with that specific driver.
Ouroboros! Back to the beginning T.T
Starting from a625225323c902b614ed9601143df3bc51550fc4, the lag persist even with EnableCopyColorToRDRAM = 0
. And it also affects LLE, but with less severity than in HLE.
Hmm... I'm on vacation, I will try to create a dual-boot installation with Windows and another distribution for testing. ...Just thinking about it, I lose all desire to do it.
Same thing. That commit added more buffer storage usage. We enable buffer storage automatically when the driver reports that it supports it. We could make a special case against that specific driver to disable buffer storage utilization.
Lets summarize the few workarounds that can be made to get GLideN64 working on r600/radeonsi Mesa drivers. From better to worse...
MESA_GL_VERSION_OVERRIDE=3.3COMPAT MESA_EXTENSION_OVERRIDE="-GL_ARB_buffer_storage"
MESA_GL_VERSION_OVERRIDE=3.3COMPAT
LIBGL_ALWAYS_SOFTWARE=1
, you can play dementia (that nothing happens)...EnableCopyColorToRDRAM = 0
to gain a few FPS (hell no ¬¬)That's all.
This is going to take some time ...I hope.
Can you try one more thing? In here: https://github.com/gonetz/GLideN64/blob/master/src/Graphics/OpenGLContext/opengl_GLInfo.cpp#L61-L62
Replace:
bufferStorage = (!isGLESX && (numericVersion >= 44)) || Utils::isExtensionSupported(*this, "GL_ARB_buffer_storage") ||
Utils::isExtensionSupported(*this, "GL_EXT_buffer_storage");
with
bufferStorage = false;
Also, this: https://github.com/gonetz/GLideN64/blob/master/src/Graphics/OpenGLContext/opengl_ContextImpl.cpp#L62-L67
Replace those lines with this:
m_graphicsDrawer.reset(new UnbufferedDrawer(m_glInfo, m_cachedFunctions->getCachedVertexAttribArray()));
@fzurita
Eureka! That do the trick!
I need to use MESA_GL_VERSION_OVERRIDE=3.3COMPAT
, but that doesn't matter.
Now we have a real workaround.
Thanks so much, really ^^
Note for on a slightly newer GCN card, this helps for me but is only a partial fix.
For the Ocarina of Time intro, when link first appears riding on Epona, without the patch performance drops from ~50fps to about 15. It's pretty consistently within 1 FPS of 15 in this case.
With the patch, performance drops from ~60 FPS to somewhere between 20-40. It's less consistent here, and there are places (generally when there are no characters on the screen) where it runs at full speed, but I'd say average with characters on the screen is about 30 FPS. That's probably closer to a real n64 than 60 FPS, but the problem is the constant stuttering. :P
Also with the patch, I have to use 3.3COMPAT or I get a black screen. I was going to make an apitrace, but it seems that tool doesn't work with compatibility profiles, unless I'm missing something.
Probably there are multiple issues with GlideN64 and Mesa on older AMD hardware, including this one.
It seems that mesa driver has poor support for some features. Even after disabling buffer storage and VBOs, there is still something it doesn't like.
GLideN64 is definitely hitting a slow path in mesa that doesn't exist in the Windows drivers, where the plugin works fine. At the same time, I think GLideN64 must be doing something unusual (not unexpected for an emulator) because I'm not seeing these symptoms in any other GL application.
The unusual thing GLideN64 does with VBO's is called Buffer Object Streaming (https://www.khronos.org/opengl/wiki/Buffer_Object_Streaming)
It works well on my Intel Mesa laptop, but I don't have any AMD devices to test.
Mesa only officially supports the core profile, and the core profile requires the use of VBO's, which is why this is needed
I've been quick testing some games. Until now I spotted three lag points, one in Bomberman 64 and two in GoldenEye 007.
I'm gonna bisect
and apply "the patch" to discern the other "bad commit". That when finish my work, of course.
Same issue here, I'm running latest m64p linux build (ubuntu build don't works correctly here).
My specifications/settings are
i3 4150 3.50ghz 12 GB RAM Radeon R7 260X 2GB VRAM Ubuntu 17.10 Kernel 4.14.15 Mesa 17.3.2
I made a small "benchmark" using GALLIUM_HUD, testing some branches and pull requests. All my tests include the changes to disable buffer storage. @fzurita, @loganmc10 and for the few people who share my problem. I hope this at least can be useful for you.
Benchmark 01 Savestate: ge007_1.zip
Benchmark 02 Savestate: ge007_0.zip
Benchmark 04 Savestate: bomberman64.zip
P.S: For adapt https://github.com/gonetz/GLideN64/issues/1561#issuecomment-360246650 to threaded_GLideN64.
Replace this: https://github.com/fzurita/GLideN64/blob/threaded_GLideN64/src/Graphics/OpenGLContext/opengl_ContextImpl.cpp#L65-L71
For this:
if (config.video.threadedVideo)
m_graphicsDrawer.reset(new UnbufferedDrawerThreadSafe(m_glInfo, m_cachedFunctions->getCachedVertexAttribArray()));
else
m_graphicsDrawer.reset(new UnbufferedDrawer(m_glInfo, m_cachedFunctions->getCachedVertexAttribArray()));
I have a few ideas for this, I'll create some test commits tomorrow for you to try out. Just to confirm, you say that LLE mode is often faster than HLE is that correct?
Just to confirm, you say that LLE mode is often faster than HLE is that correct?
Only with buffer storage enabled. If buffer storage is forcefully disabled, HLE is faster.
Edit: This is how HLE mode performs with 'master' on my hardware
All previous benchmarks with the exception of 03, are laggy places that even persist with buffer storage disabled. But with buffer storage enabled, such places are even most affected by lag. How ironic, don't make any sense.
Maybe we'll have to chat on some other platform, I thought you previously said disabling buffer storage didn't make a difference?
To make any difference, he had to disable both buffer storage and VBOs.
@Jj0YzL5nvJ This is going to seem like a silly shot in the dark, but can you try to apply this commit:
https://github.com/loganmc10/GLideN64/commit/881f58df4a5fdf539da5b8449d81c999e86ec0a4
Leave everything else like it is in the current master, and I would set EnableCopyColorToRDRAM = 0
for now in the config, just so we can work on one issue at a time.
From what I read, older ATI cards required the vertex data to be aligned to 32/64 bytes for best performance. Ours is currently at 44 bytes, so the padding brings it up to 64. I'm really not sure if it'll make a big difference, but it's worth a shot
Here is another test: https://github.com/loganmc10/GLideN64/commit/af5d5b8a1878d2e87a6a78dee25a0fd86b745266
Same rules as before, please leave everything else the same as master, and set EnableCopyColorToRDRAM = 0
. This is testing a different way to handle the buffer storage
@fzurita I'm curious if https://github.com/loganmc10/GLideN64/commit/881f58df4a5fdf539da5b8449d81c999e86ec0a4 makes any difference on Adreno devices with GLES.3.2 (they'll support VBO + buffer storage). Some stuff I read says the vertex data alignment matters on some mobile chipsets but it's hard to find solid information
I can do some quick testing on a slower Adreno device.
Neither patch appears to effect my ~1.0 GCN card.
I mentioned earlier that I was interested in running this in a profiler. Caveat: I have no idea what I'm doing.
Using the apitrace protocol, it looks like gl calls are using no more than ~2 ms in the gpu.
In the CPU, however, the graph is skewed because some calls to glTexSubImage2D is occasionally take around 100 ms or more (!!). However, this happening enough for it to be the problem on a frame-by-frame basis. I am seeing calls to glDrawElementsBaseVertex taking ~15ms. When I zoom in the graph looks like this.
The pattern here looks like several calls to glDrawElementsBaseVertex taking around 10-15ms, followed by a call to glFlushMappedBufferRange also taking about 10ms.
Is there anything anyone would be interested in me looking at?
@loganmc10, my tests without disabling buffer storage and VBOs, take them all with a grain of salt. GL_ARB_buffer_storage is certainly broken in my drivers.
Test with EnableCopyColorToRDRAM = 0
Personally I did not notice any significant changes, much less by disabling buffer storage and VBOs. But comparing results with the previous test, the differences are very significant, especially in the CPU activity, GPU activity and buffer wait time.
@BPaden, try to run using MESA_GL_VERSION_OVERRIDE=3.3COMPAT MESA_GLSL_VERSION_OVERRIDE=410 MESA_EXTENSION_OVERRIDE="-GL_ARB_buffer_storage"
Can you put the results of the following commands?
glxinfo | grep OpenGL
cat /var/log/Xorg.0.log | grep -i enabled
cat /var/log/Xorg.0.log | grep -i load
cat /var/log/Xorg.0.log | grep -i swap
@BPaden that trace is very helpful. My next test will be replacing glDrawElementsBaseVertex with glDrawArrays. I assumed this might be an issue since @Jj0YzL5nvJ mentioned that LLE works better, I believe LLE always uses glDrawArrays (I could be remembering wrong though).
Disabling VBO's is good for testing, but it can't be a long-term solution. Core OpenGL requires the use of VBO's (that's why you need the environment variable to get it to work). In a future version of Mesa, they could remove support altogether for non-VBO rendering if they wanted, so we can't count on that.
@BPaden the long glTexSubImage2D is unfortunate but not unexpected. That is when the emulator is uploading texture data to the GPU. In a normal game you would do that at the beginning, not during rendering, but the emulator doesn't know about the texture data until right before it's needed, so we have to upload it like that
Ok @BPaden @Jj0YzL5nvJ can you try this commit:
https://github.com/loganmc10/GLideN64/commit/9bcfa67d9550c7f1cd4ba72f657facd66a4d27e4
I tested this on my Nvidia laptop and saw no difference in performance, but it may make a difference for you. I'm also curious if this makes any difference on Adreno devices with buffer storage @fzurita
libGL: FPS = 60.9 libGL: FPS = 59.9 libGL: FPS = 59.9 libGL: FPS = 59.9 libGL: FPS = 56.9 libGL: FPS = 60.9 libGL: FPS = 59.9 libGL: FPS = 59.9 libGL: FPS = 59.9 libGL: FPS = 59.9 libGL: FPS = 59.9 libGL: FPS = 59.9 libGL: FPS = 59.9 libGL: FPS = 59.9 libGL: FPS = 60.9 libGL: FPS = 59.9 libGL: FPS = 59.9 libGL: FPS = 59.9 libGL: FPS = 59.9 libGL: FPS = 59.9 libGL: FPS = 59.9 libGL: FPS = 59.9 libGL: FPS = 59.9 libGL: FPS = 60.9 libGL: FPS = 59.9 libGL: FPS = 59.9 libGL: FPS = 59.9 libGL: FPS = 59.9 libGL: FPS = 59.9 libGL: FPS = 59.9 libGL: FPS = 59.9 libGL: FPS = 59.9 libGL: FPS = 60.9 libGL: FPS = 59.9 libGL: FPS = 59.9 libGL: FPS = 59.9 libGL: FPS = 59.9 libGL: FPS = 59.9 libGL: FPS = 59.9 libGL: FPS = 59.9 libGL: FPS = 59.9 libGL: FPS = 60.9 libGL: FPS = 59.9 libGL: FPS = 59.9 libGL: FPS = 59.9
😁
Sure, I'll try that. It will actually probably help performance with slower Android devices. I do know that VBOs and EBOs are slower with them. I remember in the past they were about 10% slower.
Yeah well it definitely looks like we found the bottleneck in the Mesa driver. I'm going to hop on their IRC channel and ask about this.
It's a little counterintuitive, the whole point of the elements (glDrawElements), is that you can reduce the amount of bandwidth used in uploading the vertex data. But maybe when used in conjunction with VBO streaming the benefits are negated, I'll be curious to hear if there is any difference on a mobile device.
From the moment of implementation a625225323c902b614ed9601143df3bc51550fc4, this generates some kind of delay in radeon Mesa driver (r600g).
Xubuntu 16.04.3 LTS glxinfo | grep OpenGL
The CPU consumption is not higher when compared to previous versions. And compiling with
-DCRC_OPT=On
changes almost nothing when it comes to FPS. Examples:Running SM64 until the moment the white star appears:
dddb3ae1f71afa85ef782d5f5edb9661a1b1b5bd cmake -DMUPENPLUSAPI=On ../../src/ LIBGL_SHOW_FPS=1 MESA_GL_VERSION_OVERRIDE=3.3COMPAT MESA_GLSL_VERSION_OVERRIDE=420
a625225323c902b614ed9601143df3bc51550fc4 cmake -DMUPENPLUSAPI=On ../../src/ LIBGL_SHOW_FPS=1 MESA_GL_VERSION_OVERRIDE=3.3COMPAT MESA_GLSL_VERSION_OVERRIDE=420
bbc7131655a78ae887cee481f0a67674385fc2d2 cmake -DCRC_OPT=On -DMUPENPLUSAPI=On ../../src/ LIBGL_SHOW_FPS=1
If anyone knows how to put spoilers, let me know how.