gonetz / GLideN64

A new generation, open-source graphics plugin for N64 emulators.
Other
772 stars 180 forks source link

Color buffer copy to RDRAM in async mode is partially broken? #1630

Open Jj0YzL5nvJ opened 7 years ago

Jj0YzL5nvJ commented 7 years ago

Color buffer copy to RDRAM in async mode is partially broken, at least in Banjo-Kazooie (the intro effect on the jigsaw puzzle). And the problem is old, is present in bc00985f33c8b1e2e63cebabe53d01f9cf3708a6 too.

I test this meticulously in deaf61299f12168b775d7e2d448b9eca149c0e7e and in 3cf7377 threaded_GLideN64 (I can't find it now) branch of @fzurita, with and without the changes sugested by @loganmc10 (https://github.com/gonetz/GLideN64/issues/1561#issuecomment-326972673) as part of my research problem in #1561. I think you will find my results very interesting.

The broken jigsaw puzzle effect: banjo-kazooie-008

The correct jigsaw puzzle effect: banjo-kazooie-009

The broken jigsaw puzzle effect is always present in this master branch with EnableCopyColorToRDRAM = 2 and works correcly with EnableCopyColorToRDRAM = 1, in HLE and LLE alike. By another hand, the thing become interesting in the @fzurita threaded_GLideN64 branch.

In threaded_GLideN64 with ThreadedVideo = False and EnableCopyColorToRDRAM = 2, all works correctly. But with the bufferStorage = false change in the code the thing works exactly like the master branch. With ThreadedVideo = True the thing become more weirder. In HLE mode and the normal code, it works like the master branch. But in LLE mode all works again. But with ThreadedVideo = True and the bufferStorage = false in the code, the thing broke like if EnableCopyColorToRDRAM were set to 0 (the classic black puzzle). Like in the master branch, EnableCopyColorToRDRAM = 1 always works "correctly". But I think that ThreadedVideo = True is broken with EnableCopyColorToRDRAM = 1, the performance is worst that with ThreadedVideo = False and uses less CPU... I don't know.

In general, thethreaded_GLideN64 branch works best on my system. Even with my current lag problems... This morning I did a quick test, apparently is all the same with d8ac5a761cfb7448662d90c8e47714a8b1c485ac and https://github.com/fzurita/GLideN64/commit/ba1c93d79bee37becef2df900fa0492050eb68fd. I not sure.

fzurita commented 7 years ago

Async copy color buffer to RDRAM will always be broken. I'm very surprised that it works correctly in async mode on my branch when threaded video is off.

fzurita commented 7 years ago

How much of a performance difference are you seeing in the threaded build with EnableCopyColorToRDRAM = 1 compared to the master build when threading is on? Maybe I should automatically disable threading if EnableCopyColorToRDRAM = 1 is true.

Jj0YzL5nvJ commented 7 years ago

That depend of the game and number of effects in the screen (fog, gassy, smoke, light sparkles, shines, etc). In the particular area of my tests, with EnableCopyColorToRDRAM = 1 always be low values compared to master or ThreadedVideo = False. Like -4% in the counters, at least on HLE. I don't really pay much attention to that configuration because was not beneficial to me...

banjo-kazooie-003

Normal code, master vs ThreadedVideo = True:

The average on master HLE with EnableCopyColorToRDRAM = 2: 24%, 14 VI/S, 7 FPS The average on threaded HLE with EnableCopyColorToRDRAM = 2: 26-32%, 15-19 VI/S, 8-9 FPS

The average on master HLE with EnableCopyColorToRDRAM = 0: 27%, 16 VI/S, 8 FPS The average on threaded HLE with EnableCopyColorToRDRAM = 0: 26-30%, 15-18 VI/S, 8-9 FPS

The average on master LLE with EnableCopyColorToRDRAM = 2: 43%, 26 VI/S, 13 FPS The average on threaded LLE with EnableCopyColorToRDRAM = 2: 54-63%, 32-38 VI/S, 17-19 FPS

The average on master LLE with EnableCopyColorToRDRAM = 0: 52%, 31 VI/S, 15 FPS The average on threaded LLE with EnableCopyColorToRDRAM = 0: 72-86%, 43-53 VI/S, 21-26 FPS

bufferStorage = false, master vs ThreadedVideo = True:

The average on master HLE: 29%, 17 VI/S, 7 FPS The average on threaded HLE: 25-31%, 13-18 VI/S, 7-9 FPS

The average on master LLE: 48-50%, 29-30 VI/S, 14-15 FPS The average on threaded LLE: 71-86%, 42-51 VI/S, 21-25 FPS

The parameters in master are very stable, in threaded are very chaotic. I don't saw significative performance difference between EnableCopyColorToRDRAM = 2 and EnableCopyColorToRDRAM = 0 with bufferStorage = false. But threaded are more buggy with this...

fzurita commented 7 years ago

Those are very low VI/s that you are getting. For perspective, my Android phone can get full speed in most games.

You have a big bottleneck somewhere. I find it hard to believe that the before storage commit was the cause of your performs drop even when that feature is disabled.

Also, HLE should give you much better performance than LLE.

Are you running the latest version of your video driver?

Jj0YzL5nvJ commented 7 years ago

Are you running the latest version of your video driver?

glxinfo https://0x0.st/CHU.txt

In my last test I seen lag and freezing times in bc00985 when GLideN64 try to fill more of 542M in VRAM. After that the VRAM usage reduces a few MB and start to fill again. In 313741d this never occur, but is very difficult to fill more of 120M of VRAM, is like the code is doing more cleaning VRAM than using it. I had to destroy many things in Perfect Dark to achieve to fill the VRAM and surpass the erasing VRAM code. But again this never use more of 542M or the 60% of VRAM (max 1024M).

https://github.com/gonetz/GLideN64/issues/1561#issuecomment-334994673

I going to try to upgrade the GPU BIOS. Soon...

fzurita commented 7 years ago

Updating the GPU BIOS probably won't help. What you describe with the VRAM I think could only happen under strange scenarios.

It sounds like your texture cache size for some reason is very small. In Textures.cpp find this code:

void TextureCache::_checkCacheSize()
{
    const size_t maxCacheSize = 8000;
    if (m_textures.size() >= maxCacheSize) {
        CachedTexture& clsTex = m_textures.back();
        m_cachedBytes -= clsTex.textureBytes;
        gfxContext.deleteTexture(clsTex.name);
        m_lruTextureLocations.erase(clsTex.crc);
        m_textures.pop_back();
    }

    if (m_cachedBytes <= m_maxBytes)
        return;

    Textures::iterator iter = m_textures.end();
    do {
        --iter;
        CachedTexture& tex = *iter;
        m_cachedBytes -= tex.textureBytes;
        gfxContext.deleteTexture(tex.name);
        m_lruTextureLocations.erase(tex.crc);
    } while (m_cachedBytes > m_maxBytes && iter != m_textures.cbegin());
    m_textures.erase(iter, m_textures.end());
}

Replace it with this:

void TextureCache::_checkCacheSize()
{
    const size_t maxCacheSize = 15000;
    if (m_textures.size() >= maxCacheSize) {
        CachedTexture& clsTex = m_textures.back();
        m_cachedBytes -= clsTex.textureBytes;
        gfxContext.deleteTexture(clsTex.name);
        m_lruTextureLocations.erase(clsTex.crc);
        m_textures.pop_back();
    }
}
Jj0YzL5nvJ commented 7 years ago

Nope, I don't see changes u.u I will have to track the specific commits who cause lag in every game individually, thanks anyway.

Edit: Just in case, in the past I used to use -DUSE_UNIFORMBLOCK=On because I noticed better performance. Now is deprecated, apparently.

fzurita commented 7 years ago

What do you mean? Your can't find the code in Textures.cpp? Can you try making the change above and see if it helps?

Jj0YzL5nvJ commented 7 years ago

I did the changes, I don't saw improvement.

AmbientMalice commented 7 years ago

What is your GPU, BTW?

fzurita commented 7 years ago

He posted that earlier: https://0x0.st/CHU.txt

I don't really understand this problem.

fzurita commented 7 years ago

@Jj0YzL5nvJ Can you try forcing OpenGL ES mode? In this file:

https://github.com/gonetz/GLideN64/blob/master/src/Graphics/OpenGLContext/opengl_GLInfo.cpp

Change

isGLESX = strstr(strVersion, "OpenGL ES") != nullptr;

to

isGLESX = true;

Then in this file https://github.com/gonetz/GLideN64/blob/master/src/Graphics/OpenGLContext/mupen64plus/mupen64plus_DisplayWindow.cpp

Change

CoreVideo_GL_SetAttribute(M64P_GL_CONTEXT_PROFILE_MASK, M64P_GL_CONTEXT_PROFILE_CORE);

to

CoreVideo_GL_SetAttribute(M64P_GL_CONTEXT_PROFILE_MASK, M64P_GL_CONTEXT_PROFILE_ES);
Jj0YzL5nvJ commented 7 years ago

Currently there are a lot of regressions in Mesa with ATI / AMD hardware, so I downgraded some of my repositories (mesa, xorg, kernel) and uninstall all my custom builds and ...all the same, nothing happened ...except that VA-API now works for some strange reason. Now I can compare it with VDPAU.

Thanks to the downgrade, I can now give you a less boring result, related to the forced OpenGL ES mode.

Before the downgrade:

gliden64.log [gles2GlideN64]: Error setting videomode 640x480

In the mupen64plus launch:

Input: Mupen64Plus SDL Input Plugin version 2.5.0 initialized.
(II) Setting video mode 640x480...
Core: Setting video mode: 640x480
Core Error: SDL_SetVideoMode failed: Could not create GL context: GLXBadFBConfig
Violación de segmento (`core' generado)

And not work...

After the downgrade:

Some warnings with GCC: https://0x0.st/Clt.txt

In the mupen64plus launch:

UI-Console Status: Cheat codes disabled.
UI-Console Error: dlopen('./mupen64plus-video-GLideN64-MOD-977fddf.so') failed: ./mupen64plus-video-GLideN64-MOD-977fddf.so: undefined symbol: png_set_longjmp_fn
UI-Console: using Video plugin: 'GLideN64' v2.0.0

And it works, lagged like the normal code and more glitchy, but works.

I can reproduce the "Core Error" by recompiling zlib and libpng in the current versions.

After the downgrade a new corona bug appeared in Perfect Dark, now the sun is always visible in LLE mode. perfect_dark-000

Current glxinfo: https://0x0.st/Clv.txt

AmbientMalice commented 7 years ago

After the downgrade a new corona bug appeared in Perfect Dark, now the sun is always visible in LLE mode.

That's normal behavior, AFAIK. Depth buffer stuff is pretty broken in LLE.

Jj0YzL5nvJ commented 7 years ago

That not happen with Mesa 17.3.0-devel, but I can't assure you that that version worked correctly at all. So, is only a mention with reference purposes (not bug report).