Open i30817 opened 5 years ago
I tried the angrylion plugin on standalone but it was even slower, so at least there is nothing surprising there.
Can you check glide64, glide64mk2 or rice? If these plugins are slow too, then the problem is not on GLideN64 side. Also, please try to run GLideN64 with frame buffer emulation disabled. Some frame buffer emulation options can be too heavy for your system.
I'd have to build them, they're not distributed on the site?
Ok i downloaded a older version of the emulator (mupen64plus-bundle-linux64-2.5-ubuntu) and copied over the glide64mk2 and rice plugins. Rice was indeed much faster on Banjo Kazooie intro (but had many 'jerk animation backwards in time' errors, glide64mk2 the emulator though it wasn't a plugin when starting a game, but still could be selected (funny enough sound started for a while if you tried and then lost focus on the window).
comparison:
mupen64plus-video-rice.so reports:
60.000-60.200 VI/s (with the setting on on the top bar, it does have on screen notifications the same as glide).
mupen64plus-video-GlideN64.so reports:
23% 14 VI/S 7 FPS enabling or disabling frame buffer emulation (and i tested many other settings too) makes no difference.
I think this might be a memory leak. I used heaptrack https://github.com/KDE/heaptrack and it segfaulted when loading a rom (when normal it just becomes ultra slow and not crash). The result of the --analyze is here:
https://gist.github.com/i30817/4a68e185ed2437c2cc3331ef95c9392b
hope it helps.
edit: then again it may just have crashed because heaptrack requires changing the system allocator, though the 600 mb lost are sorta suspicious.
Not much total memory leaked: 562.19MB it is huge leak, but I can't get from the report the source of the leaks. I'm not familiar with that tool. I usually use valgrind, which can track where memory was leaked and how much. And yes, 600 mb lost is too suspicious.
From the other side, it is unclear why GLideN64 is so slow in compare with RiceVideo. Sorry for stupid question, but do you test release build of GLideN64? Debug one is used to be slow even on my desktop.
i used the linux 64 bits plugin and emulator from this site https://m64p.github.io/
It's actually kind of confusing there are that many sites distributing it.
I attached a new image showing the allocation is a huge jump all (or nearly) at once on the previous post, but it didn't work so here it is:
I'm not sure I see much specific to GLideN64 there, its mostly noise from gtk and qt being broken, I suppose as a result from using mupen64plus-gui. It would be easier to debug with the mupen64plus console ui or even with RetroArch to avoid all that noise....
I can't explain it. GLideN64 can't eat so much, otherwise it would never start on most of Android devices. May be it is QT indeed.
Could you build GLideN64 from sources to be sure that it is not a bad build problem?
I'm going to try with retroarch because it was just as slow and i can start retroarch cores from the cmd line (unlike the download from that site) if you don't mind.
I would try the mupen64plus console ui, the libretro core is pretty out of date...
Well, i had already tried to build mupen but
Makefile:152: *** Mupen64Plus API header files not found! Use makefile parameter APIDIR to force a location.. Stop.
Retroarchs heaptrack curiously crashes in the same way but the memory leak is a order of magnitude smaller, 60mb instead of 560.
Problem is that it's nearly all in 'unresolved function' below __libc_start_main. I think i really need to compile it but i need headers. And I might be chasing just mupen not liking the allocator.
I don't think that building mupen from sources is so necessary since Rice video works ok for you. Also, you may install mupen core and other parts from packages. They can be outdated too, but hardly too old to run. I suggest to build (release!) GLideN64 from sources and set it explicitly in mupen64plus parameters with --gfx
.
Its a pretty annoying build honestly...
Basically you need to clone these repos.
https://github.com/mupen64plus/mupen64plus-core https://github.com/mupen64plus/mupen64plus-ui-console https://github.com/mupen64plus/mupen64plus-input-sdl https://github.com/mupen64plus/mupen64plus-rsp-hle https://github.com/mupen64plus/mupen64plus-audio-sdl
Then you can build them one by one (Make sure to build mupen64plus-core first), for example.
cd mupen64plus-core/projects/unix
make all
make install
Then for the following plugins you pass APIDIR
to make all
to point to the mupen64plus headers, by default they are installed to /usr/local/mupen64plus, but maybe you can skip make install
and point to src/api/
in the mupen64plus-core repo?
See the output of make
to see the full list of arguments when building each plugin.
As for GLideN64 it uses a standard cmake build.
Can i make uninstall on all of those? I'm not a fan of getting aleatory library headers into my system for apt to freak out later and i was burned before by 'just install from source' procedures.
I guess i should try to find a system install of mupen64plus-console before trying that.
They should support make uninstall
and as I explained, you might be able to skip make install
.
I give up, i don't know how to read this.
You can see the allocations it by installing heaptrack-gui and calling heaptrack_gui on this gz. I built the repos above, and built the debug version of the gfx plugin, make install
all of them except the gfx plugin and invoked it with heaptrack mupen64plus --gfx ./mupen64plus-video-GLideN64.so --audio /usr/local/lib/mupen64plus/mupen64plus-audio-sdl.so --input /usr/local/lib/mupen64plus/mupen64plus-input-sdl.so --rsp /usr/local/lib/mupen64plus/mupen64plus-rsp-hle.so Banjo-Kazooie\ \(USA\)\ \(Rev\ A\).n64
I am running on Gnome 3 Wayland ubuntu with a very old ati card r710 (mobile version).
heaptrack.mupen64plus.27803.gz
It gives images like this, but won't crash without the heaptrack (just be slow as heck):
Funny enough the '500mb' leak returns even without QT. Supposedly.
CoreStartup has a function call init_mem_base() which seems a prime candidate for the screw up.
I'm actually confused about you mentioning that 512 mb is 'too much' because init_mem_base()' seems to request that by default:
MB_MAX_SIZE = MB_PIF_MEM + PIF_ROM_SIZE + PIF_RAM_SIZE
MB_MAX_SIZE_FULL = 0x20000000
....
void* init_mem_base(void)
{
void* mem_base;
/* First try the full mem base alloc */
mem_base = malloc(MB_MAX_SIZE_FULL);
if (mem_base == NULL) {
/* if it failed, try the compressed mem base alloc */
mem_base = malloc(MB_MAX_SIZE);
if (mem_base != NULL) {
/* Compressed mem base mode has LSB = 1 */
assert(MEM_BASE_MODE(mem_base) == 0);
SET_MEM_BASE_MODE(mem_base);
DebugMessage(M64MSG_INFO, "Using compressed mem base");
}
}
else {
/* Full mem base mode has LSB = 0 */
assert(MEM_BASE_MODE(mem_base) == 0);
DebugMessage(M64MSG_INFO, "Using full mem base");
}
return mem_base;
}
Anyway, heaptrack crashes when the app requests that on top of their tracking, not very surprising. And since linux has paging, it's also not surprising that it 'succeeds' and starts to use memory horribly.
free --mega
total used free shared buff/cache available
Mem: 4035 2042 549 111 1443 1592
Swap: 2051 48 2003
This might not explain the slowness but just the heaptrack crash. I'm pretty sure i noticed the slowness without anything else open.
edit: even when firefox is closed:
free --mega
total used free shared buff/cache available
Mem: 4035 1147 1518 29 1369 2569
Swap: 2051 48 2003
it crashes with the same '540 leak' so i guess heaptrack is much more demanding of memory than i expected or is crashing from another bug.
Ok, the crash was a read herring. I managed to 'mostly fix it' by setting EnableLegacyBlending = True
So it was a shader slowdown. Meanwhile by profiling with perf and recompiling i found that in my system the noise generator works faster with one thread in spite of being dual core, uh.
edit: though, ofc it doesn't help in retroarch, why should it.
Well, shaders created by GLideN64 are quite heavy. It is payment for accuracy. N64 hardware differs from PC one in so many aspects that lots of functionality have to be emulated in pixel shaders. You may also disable mip-mapping emulation and get some performance boost in some games.
I experimented years ago with doing noise on a GPU level, skipping that problem with speed entirely. However, the problem was getting random enough results, but that could be worked on.
I'm actually a bit shocked about how a cpu on low power mode is so much less of a problem than a gpu. This situation - the software renderer version of a emulator being much faster than the gpu renderer with both the cpu and gpu at their lowest and the emu on lowest settings also repeats with retroarch's beetle hw and beetle sw. Beetle HW in minimal settings with speeds like 7-12 fps and software runs nearly full speed 55 fps in gameplay.
It's a bit worrying to be honest. Then there are outliers like dolphin of all things that can run resident evil Remake at 20fps in the same machine right after. I just don't know, maybe devs should profile on these states to see if there is something pathological going on like excessive GPU/CPU back and forth reading that multiplies the slowness factors that they can't notice because of great cards.
Many people think that if some emulator emulates much more powerful hardware than N64 then it needs much more powerful PC hardware than the one for N64 emulators. It is not necessary so. Well, pure software pixel accurate N64 emulator should be faster than similar GameCube emulator. The situation changes when we use PC graphics card to render graphics. N64 emulators still can be much faster, but only until they emulate N64 hardware using mainly Fixed Function Pipeline. Glide64, glN64 are the examples. There are plenty of features of N64 hardware, which can not be properly emulated with Fixed Function Pipeline. Pixel shaders are necessary to emulate these features. GLideN64 uses many different pixel shaders. These shaders often use calculations, which older PC cards can't do efficiently. So, while N64 games use much less polygons than GC games, proper rendering of these polygons may require much more calculations on GPU side. GLideN64 devs already profiled shaders code, and many optimizations have been done. I'm sure that there are still rooms for optimization, but GLideN64 definitely is not for low-end hardware.
AFAIK one of the heaviest bits is the N64 depth image emulation, thanks to loadimagestore and its heavy use of sync. Could be wrong tho. Didn't you @gonetz try using plain FBOs and plain 16bit unsigned short depth texture attachments in the past for N64 depth rendering? Surely OGL supports such textures instead of using image2d.
Image textures are used only when N64 depth compare option enabled. Otherwise plugin uses plain FBO with plain depth texture attachment.
More than might be expected that is.
For instance, i can run (the retroarch version but also happens on standalone mupen) parellel64, which is pure software faster than mupen (retroarch or standalone). Even desmume, running (not the same game obviously but on the same conditions) feels faster. I had hoped that the recent fix for radeon open source had fixed this but if it did, it was only part of the problem for me.
Some measures:
Linux amd64
My graphics card and cpu are on battery mode (otherwise, overheat). Card is on dpm mode, low/battery.
cpu is limited to 1.2ghz:
cpufrequtils 008: cpufreq-info (C) Dominik Brodowski 2004-2009 Report errors and bugs to cpufreq@vger.kernel.org, please. analyzing CPU 0: driver: acpi-cpufreq CPUs which run at the same hardware frequency: 0 CPUs which need to have their frequency coordinated by software: 0 maximum transition latency: 10.0 us. hardware limits: 1.20 GHz - 2.20 GHz available frequency steps: 2.20 GHz, 1.60 GHz, 1.20 GHz available cpufreq governors: conservative, ondemand, userspace, powersave, performance, schedutil current policy: frequency should be within 1.20 GHz and 1.20 GHz. The governor "ondemand" may decide which speed to use within this range. current CPU frequency is 1.20 GHz. cpufreq stats: 2.20 GHz:0,03%, 1.60 GHz:0,00%, 1.20 GHz:99,97% (54) analyzing CPU 1: driver: acpi-cpufreq CPUs which run at the same hardware frequency: 1 CPUs which need to have their frequency coordinated by software: 1 maximum transition latency: 10.0 us. hardware limits: 1.20 GHz - 2.20 GHz available frequency steps: 2.20 GHz, 1.60 GHz, 1.20 GHz available cpufreq governors: conservative, ondemand, userspace, powersave, performance, schedutil current policy: frequency should be within 1.20 GHz and 1.20 GHz. The governor "ondemand" may decide which speed to use within this range. current CPU frequency is 1.20 GHz. cpufreq stats: 2.20 GHz:0,03%, 1.60 GHz:0,00%, 1.20 GHz:99,97% (35)
This is a late reading from the banjo kazooie intro (the emulator starts to slow down as soon as the 'circle' starts growing, briefly becomes 'fast enough' for the sound not to be delayed when the image is all white during the transition from the first part of the intro where the Nintendo N is moving around and the rest and returns to slow after) with nothing else open but a shell, gnome-session perf top and mupen standalone:
16.46% [kernel] [k] acpi_processor_ffh_cstate_enter 2.71% libsamplerate.so.0.1.8 [.] 0x00000000000035d5 2.36% mupen64plus-video-GLideN64.so [.] _Z9RasterizeP7vertexiii 1.92% libsamplerate.so.0.1.8 [.] 0x000000000000366c 1.51% libsamplerate.so.0.1.8 [.] 0x00000000000036a7 1.41% libsamplerate.so.0.1.8 [.] 0x000000000000360f 1.03% libc-2.27.so [.] __memcpy_ssse3 1.02% perf [.] 0x000000000029c093 0.92% [kernel] [k] read_hpet 0.92% libsamplerate.so.0.1.8 [.] 0x00000000000036ab 0.89% libsamplerate.so.0.1.8 [.] 0x0000000000003613 0.88% libmupen64plus.so.2 [.] dyna_jump 0.82% libsamplerate.so.0.1.8 [.] 0x0000000000003672 0.77% libsamplerate.so.0.1.8 [.] 0x0000000000003603 0.74% libsamplerate.so.0.1.8 [.] 0x0000000000003693 0.74% libmupen64plus.so.2 [.] dynarec_jump_to_recomp_address 0.64% libsamplerate.so.0.1.8 [.] 0x00000000000035fb 0.63% libsamplerate.so.0.1.8 [.] 0x000000000000369b 0.62% mupen64plus-video-GLideN64.so [.] _ZN18ColorBufferToRDRAM5_copyEj 0.59% perf [.] 0x00000000001ea447 �[H�[2J PerfTop: 4338 irqs/sec kernel:38.5% exact: 0.0% [4000Hz cycles:pp], (all, 2 CPUs)
this is desmume running order of Ecclesia, just before game play when the narrator is speaking about the order on the stained glass window text roll:
9.36% [kernel] [k] acpi_processor_ffh_cstate_enter 1.26% desmume [.] 0x000000000011029f 1.21% libc-2.27.so [.] __memcpy_ssse3 1.02% desmume [.] 0x0000000000207f66 0.61% desmume [.] 0x00000000001ba3f0 0.59% desmume [.] 0x0000000000207502 0.54% desmume [.] 0x000000000017f922 0.47% desmume [.] 0x00000000001109e0 0.44% desmume [.] 0x0000000000207f73 0.43% desmume [.] 0x00000000001bb354 0.42% [kernel] [k] read_hpet 0.39% desmume [.] 0x000000000017eb92 0.36% desmume [.] 0x0000000000207494 0.35% desmume [.] 0x00000000001c2495 0.35% desmume [.] 0x000000000014fb32 0.35% desmume [.] 0x000000000017f929 0.34% desmume [.] 0x00000000001bb346 0.33% [kernel] [k] memset 0.32% desmume [.] 0x0000000000207c8b 0.31% desmume [.] 0x00000000001102a9 �[H�[2J PerfTop: 6899 irqs/sec kernel:25.2% exact: 0.0% [4000Hz cycles:pp], (all, 2 CPUs)
I know the emulators aren't directly comparable and the N64 clock speed is even slightly superior to the DS, but i still find it strange how much faster is parellel64 than mupen, so i feel something is 'wrong'. I also find it a bit strange that a library for sample rate conversions would take more cpu time than the graphical plugin, though i suppose the GPU being on low power profile might be delaying the emulator enough for something weird to happen to sound.
I tried the angrylion plugin on standalone but it was even slower, so at least there is nothing surprising there.
Considering a Gamecube can run almost all N64 roms at their original speed (using an official emulator from Nintendo though), what system do you have? Also worth noting that rendering on the GameCube/Wii is as much differrent from a pc than from the Nintendo 64 and know nothings that looks like shaders.
I'm actually a bit shocked about how a cpu on low power mode is so much less of a problem than a gpu. This situation - the software renderer version of a emulator being much faster than the gpu renderer with both the cpu and gpu at their lowest and the emu on lowest settings also repeats with retroarch's beetle hw and beetle sw. Beetle HW in minimal settings with speeds like 7-12 fps and software runs nearly full speed 55 fps in gameplay.
It's a bit worrying to be honest. Then there are outliers like dolphin of all things that can run resident evil Remake at 20fps in the same machine right after. I just don't know, maybe devs should profile on these states to see if there is something pathological going on like excessive GPU/CPU back and forth reading that multiplies the slowness factors that they can't notice because of great cards.
What s even funnier is how I use the official virtual console (which also jit roms) along android s version of Dolphin in order to get decent speeds (15 to 20 fps) on my Tablet (through underclocking the emulated cpu at 30% of it s original speed though because adding a second jit level tackle jit caching and in turns speed).
what system do you have?
It's a 13 years first gen core duo mobile, so basically a piece of shit using the mesa drivers for the amd card, which i have to further underclock not to overheat, so the following values should say '1.3ghz' for the cpu and who knows for the gpu (lowest state possible for the thing).
retroarch says:
[INFO] CPU Model Name: Intel(R) Core(TM)2 Duo CPU T6600 @ 2.20GHz [INFO] Capabilities: MMX MMXEXT SSE SSE2 SSE3 SSSE3 SSE4 [INFO] [GL]: Vendor: X.Org, Renderer: AMD RV710 (DRM 2.50.0 / 5.3.0-62-generic, LLVM 9.0.0). [INFO] [GL]: Version: 3.0 Mesa 19.2.8.
More than might be expected that is.
For instance, i can run (the retroarch version but also happens on standalone mupen) parellel64, which is pure software faster than mupen (retroarch or standalone). Even desmume, running (not the same game obviously but on the same conditions) feels faster. I had hoped that the recent fix for radeon open source had fixed this but if it did, it was only part of the problem for me.
Some measures:
Linux amd64
My graphics card and cpu are on battery mode (otherwise, overheat). Card is on dpm mode, low/battery.
cpu is limited to 1.2ghz:
This is a late reading from the banjo kazooie intro (the emulator starts to slow down as soon as the 'circle' starts growing, briefly becomes 'fast enough' for the sound not to be delayed when the image is all white during the transition from the first part of the intro where the Nintendo N is moving around and the rest and returns to slow after) with nothing else open but a shell, gnome-session perf top and mupen standalone:
this is desmume running order of Ecclesia, just before game play when the narrator is speaking about the order on the stained glass window text roll:
I know the emulators aren't directly comparable and the N64 clock speed is even slightly superior to the DS, but i still find it strange how much faster is parellel64 than mupen, so i feel something is 'wrong'. I also find it a bit strange that a library for sample rate conversions would take more cpu time than the graphical plugin, though i suppose the GPU being on low power profile might be delaying the emulator enough for something weird to happen to sound.
I tried the angrylion plugin on standalone but it was even slower, so at least there is nothing surprising there.