Open Jj0YzL5nvJ opened 7 years ago
Async copy color buffer to RDRAM will always be broken. I'm very surprised that it works correctly in async mode on my branch when threaded video is off.
How much of a performance difference are you seeing in the threaded build with EnableCopyColorToRDRAM = 1
compared to the master build when threading is on? Maybe I should automatically disable threading if EnableCopyColorToRDRAM = 1
is true.
That depend of the game and number of effects in the screen (fog, gassy, smoke, light sparkles, shines, etc). In the particular area of my tests, with EnableCopyColorToRDRAM = 1
always be low values compared to master or ThreadedVideo = False
. Like -4% in the counters, at least on HLE. I don't really pay much attention to that configuration because was not beneficial to me...
Normal code, master vs ThreadedVideo = True
:
The average on master HLE with EnableCopyColorToRDRAM = 2
:
24%, 14 VI/S, 7 FPS
The average on threaded HLE with EnableCopyColorToRDRAM = 2
:
26-32%, 15-19 VI/S, 8-9 FPS
The average on master HLE with EnableCopyColorToRDRAM = 0
:
27%, 16 VI/S, 8 FPS
The average on threaded HLE with EnableCopyColorToRDRAM = 0
:
26-30%, 15-18 VI/S, 8-9 FPS
The average on master LLE with EnableCopyColorToRDRAM = 2
:
43%, 26 VI/S, 13 FPS
The average on threaded LLE with EnableCopyColorToRDRAM = 2
:
54-63%, 32-38 VI/S, 17-19 FPS
The average on master LLE with EnableCopyColorToRDRAM = 0
:
52%, 31 VI/S, 15 FPS
The average on threaded LLE with EnableCopyColorToRDRAM = 0
:
72-86%, 43-53 VI/S, 21-26 FPS
bufferStorage = false
, master vs ThreadedVideo = True
:
The average on master HLE: 29%, 17 VI/S, 7 FPS The average on threaded HLE: 25-31%, 13-18 VI/S, 7-9 FPS
The average on master LLE: 48-50%, 29-30 VI/S, 14-15 FPS The average on threaded LLE: 71-86%, 42-51 VI/S, 21-25 FPS
The parameters in master are very stable, in threaded are very chaotic.
I don't saw significative performance difference between EnableCopyColorToRDRAM = 2
and EnableCopyColorToRDRAM = 0
with bufferStorage = false
. But threaded are more buggy with this...
Those are very low VI/s that you are getting. For perspective, my Android phone can get full speed in most games.
You have a big bottleneck somewhere. I find it hard to believe that the before storage commit was the cause of your performs drop even when that feature is disabled.
Also, HLE should give you much better performance than LLE.
Are you running the latest version of your video driver?
Are you running the latest version of your video driver?
glxinfo https://0x0.st/CHU.txt
In my last test I seen lag and freezing times in bc00985 when GLideN64 try to fill more of 542M in VRAM. After that the VRAM usage reduces a few MB and start to fill again. In 313741d this never occur, but is very difficult to fill more of 120M of VRAM, is like the code is doing more cleaning VRAM than using it. I had to destroy many things in Perfect Dark to achieve to fill the VRAM and surpass the erasing VRAM code. But again this never use more of 542M or the 60% of VRAM (max 1024M).
https://github.com/gonetz/GLideN64/issues/1561#issuecomment-334994673
I going to try to upgrade the GPU BIOS. Soon...
Updating the GPU BIOS probably won't help. What you describe with the VRAM I think could only happen under strange scenarios.
It sounds like your texture cache size for some reason is very small. In Textures.cpp find this code:
void TextureCache::_checkCacheSize()
{
const size_t maxCacheSize = 8000;
if (m_textures.size() >= maxCacheSize) {
CachedTexture& clsTex = m_textures.back();
m_cachedBytes -= clsTex.textureBytes;
gfxContext.deleteTexture(clsTex.name);
m_lruTextureLocations.erase(clsTex.crc);
m_textures.pop_back();
}
if (m_cachedBytes <= m_maxBytes)
return;
Textures::iterator iter = m_textures.end();
do {
--iter;
CachedTexture& tex = *iter;
m_cachedBytes -= tex.textureBytes;
gfxContext.deleteTexture(tex.name);
m_lruTextureLocations.erase(tex.crc);
} while (m_cachedBytes > m_maxBytes && iter != m_textures.cbegin());
m_textures.erase(iter, m_textures.end());
}
Replace it with this:
void TextureCache::_checkCacheSize()
{
const size_t maxCacheSize = 15000;
if (m_textures.size() >= maxCacheSize) {
CachedTexture& clsTex = m_textures.back();
m_cachedBytes -= clsTex.textureBytes;
gfxContext.deleteTexture(clsTex.name);
m_lruTextureLocations.erase(clsTex.crc);
m_textures.pop_back();
}
}
Nope, I don't see changes u.u I will have to track the specific commits who cause lag in every game individually, thanks anyway.
Edit:
Just in case, in the past I used to use -DUSE_UNIFORMBLOCK=On
because I noticed better performance. Now is deprecated, apparently.
What do you mean? Your can't find the code in Textures.cpp? Can you try making the change above and see if it helps?
I did the changes, I don't saw improvement.
What is your GPU, BTW?
He posted that earlier: https://0x0.st/CHU.txt
I don't really understand this problem.
@Jj0YzL5nvJ Can you try forcing OpenGL ES mode? In this file:
https://github.com/gonetz/GLideN64/blob/master/src/Graphics/OpenGLContext/opengl_GLInfo.cpp
Change
isGLESX = strstr(strVersion, "OpenGL ES") != nullptr;
to
isGLESX = true;
Then in this file https://github.com/gonetz/GLideN64/blob/master/src/Graphics/OpenGLContext/mupen64plus/mupen64plus_DisplayWindow.cpp
Change
CoreVideo_GL_SetAttribute(M64P_GL_CONTEXT_PROFILE_MASK, M64P_GL_CONTEXT_PROFILE_CORE);
to
CoreVideo_GL_SetAttribute(M64P_GL_CONTEXT_PROFILE_MASK, M64P_GL_CONTEXT_PROFILE_ES);
Currently there are a lot of regressions in Mesa with ATI / AMD hardware, so I downgraded some of my repositories (mesa, xorg, kernel) and uninstall all my custom builds and ...all the same, nothing happened ...except that VA-API now works for some strange reason. Now I can compare it with VDPAU.
Thanks to the downgrade, I can now give you a less boring result, related to the forced OpenGL ES mode.
Before the downgrade:
gliden64.log
[gles2GlideN64]: Error setting videomode 640x480
In the mupen64plus launch:
Input: Mupen64Plus SDL Input Plugin version 2.5.0 initialized.
(II) Setting video mode 640x480...
Core: Setting video mode: 640x480
Core Error: SDL_SetVideoMode failed: Could not create GL context: GLXBadFBConfig
Violación de segmento (`core' generado)
And not work...
After the downgrade:
Some warnings with GCC: https://0x0.st/Clt.txt
In the mupen64plus launch:
UI-Console Status: Cheat codes disabled.
UI-Console Error: dlopen('./mupen64plus-video-GLideN64-MOD-977fddf.so') failed: ./mupen64plus-video-GLideN64-MOD-977fddf.so: undefined symbol: png_set_longjmp_fn
UI-Console: using Video plugin: 'GLideN64' v2.0.0
And it works, lagged like the normal code and more glitchy, but works.
I can reproduce the "Core Error" by recompiling zlib and libpng in the current versions.
After the downgrade a new corona bug appeared in Perfect Dark, now the sun is always visible in LLE mode.
Current glxinfo: https://0x0.st/Clv.txt
After the downgrade a new corona bug appeared in Perfect Dark, now the sun is always visible in LLE mode.
That's normal behavior, AFAIK. Depth buffer stuff is pretty broken in LLE.
That not happen with Mesa 17.3.0-devel, but I can't assure you that that version worked correctly at all. So, is only a mention with reference purposes (not bug report).
Color buffer copy to RDRAM in async mode is partially broken, at least in Banjo-Kazooie (the intro effect on the jigsaw puzzle). And the problem is old, is present in bc00985f33c8b1e2e63cebabe53d01f9cf3708a6 too.
I test this meticulously in deaf61299f12168b775d7e2d448b9eca149c0e7e and in
3cf7377 threaded_GLideN64
(I can't find it now) branch of @fzurita, with and without the changes sugested by @loganmc10 (https://github.com/gonetz/GLideN64/issues/1561#issuecomment-326972673) as part of my research problem in #1561. I think you will find my results very interesting.The broken jigsaw puzzle effect:
The correct jigsaw puzzle effect:
The broken jigsaw puzzle effect is always present in this master branch with
EnableCopyColorToRDRAM = 2
and works correcly withEnableCopyColorToRDRAM = 1
, in HLE and LLE alike. By another hand, the thing become interesting in the @fzuritathreaded_GLideN64
branch.In
threaded_GLideN64
withThreadedVideo = False
andEnableCopyColorToRDRAM = 2
, all works correctly. But with thebufferStorage = false
change in the code the thing works exactly like the master branch. WithThreadedVideo = True
the thing become more weirder. In HLE mode and the normal code, it works like the master branch. But in LLE mode all works again. But withThreadedVideo = True
and thebufferStorage = false
in the code, the thing broke like if EnableCopyColorToRDRAM were set to 0 (the classic black puzzle). Like in the master branch,EnableCopyColorToRDRAM = 1
always works "correctly". But I think thatThreadedVideo = True
is broken withEnableCopyColorToRDRAM = 1
, the performance is worst that withThreadedVideo = False
and uses less CPU... I don't know.In general, the
threaded_GLideN64
branch works best on my system. Even with my current lag problems... This morning I did a quick test, apparently is all the same with d8ac5a761cfb7448662d90c8e47714a8b1c485ac and https://github.com/fzurita/GLideN64/commit/ba1c93d79bee37becef2df900fa0492050eb68fd. I not sure.