UltraStar-Deluxe / USDX

The free and open source karaoke singing game UltraStar Deluxe, inspired by Sony SingStar™
https://usdx.eu
GNU General Public License v2.0
812 stars 160 forks source link

How to profile USDX (preferably Linux)? #801

Open barbeque-squared opened 7 months ago

barbeque-squared commented 7 months ago

I'm noticing that on my Linux system, USDX is using quite a lot of CPU for essentially not doing all that much. These are rough averages as reported by htop, taken without video and generally just idling on the screen, on a i3-3250:

Especially the editor one is making other applications -- at least Firefox and Discord -- really hard to use. This is true even if I completely close Discord and in Firefox have just a single tab with a hundred lines plaintext file, scrolling gets very slow.

I suspect USDX is doing something and I want to figure out what.

What I've tried:

  1. make PFLAGS="-pg -Fl/usr/lib/gcc/x86_64-pc-linux-gnu/12.2.1" then use gprof
  2. perf

Number 1 appears to only take measurements for one thread (the main thread?). I don't know what to make of number 2; maybe I'm missing compile flags and need to run perf in a certain way, but all I've got so far is gibberish.

What I'm looking for is some way to figure out what is actually causing the CPU usage. Creative solutions like not using threads at all are also fine, as long as pinpointing where it's using CPU is still possible.

Rejected tools: I did see Intel VTune Amplifier mentioned on some StackOverflow answer. I did not test this because on Archlinux this is part of some 15GB suite. Moreover, it only works for Intel CPU's which would make it useless for people with other CPU's.

basisbit commented 7 months ago

I blame your system setup for this. Here on an old laptop with Haswell refresh CPU and Windows 11, it is at about 5% CPU usage.

basisbit commented 7 months ago

I mean, seriously, USDX runs perfectly fine on a 12 years old laptop which did cost 350€ when it was new back then. You should look at your graphics drivers and OS's rendering pipeline / xOrg/Wayland config.

basisbit commented 7 months ago

In general, such non-issues might be better fitting into the discussion feature of GitHub, instead of being "issues" which spamming everyone who follows the project.

s09bQ5 commented 6 months ago

Github has a discussion feature that is not tied to issues or pull requests? Where?

I can confirm that rendering is slow on 2nd generation Intel Core GPUs (Pentium 997 in my case). If you display the help text, the cpu usage of the main thread rises to 100% and the frame rate drops to 54 fps even when playing in an 800x600 window. So I guess the GPU doesn't like the way we draw text.

I'll try to build a bleeding edge LLVM+Mesa on that machine to see if that is faster.

maybe I'm missing compile flags and need to run perf in a certain way, but all I've got so far is gibberish.

@barbeque-squared, what does perf top display? With the stock Debian Bullseye Mesa it looks like the load is evenly spread out over many functions.

barbeque-squared commented 6 months ago

@s09bQ5 Compiled with make PFLAGS="-g -gl"

perf top -p 2974,3088,3114,3118

Editor idling: 2024-01-03-164252_814x893_scrot

Editor displaying help overlay idling: 2024-01-03-164441_817x902_scrot

I can actually Enter on any ultrastardx line in perf and it can do things like annotate it (shows a bunch of what looks like assembly instructions plus what looks like USDX source code) or zoom in on it (no idea what this does).

I'd also expect to see at least some substantial differences between the two screenshots, considering just showing the help text almost doubles the actual CPU usage? Or is that a wrong assumption on my part?

s09bQ5 commented 6 months ago

perf top -p 2974,3088,3114,3118

You are listing different threads of the same process. Use -t to limit the display to specific threads. With -p it will display all threads of the process(es).

I'd also expect to see at least some substantial differences between the two screenshots, considering just showing the help text almost doubles the actual CPU usage?

By limiting the display to one process, the percentages for ultrastardx will always add up to 100%. And if the GPU rendering needs twice as many CPU cycles, all you will see is that some GPU-unrelated symbols use about half as many %.

For absolute values I suggest to run ultrastardx on a single CPU core (taskset -c 0 ultrastardx) while letting perf sample the cpu-clock event on that core (perf top -C 0 -e cpu-clock). That way unused CPU cycles end up in the cpuidle_enter_state kernel symbol. But this of course works only when ultrastardx needs less than 100%.

You need to restart perf top to reset its statistics.

barbeque-squared commented 6 months ago

The taskset -c 0 ultrastardx really did the trick for me. In the end the -e cpu-clock didn't do that much different compared to the default, but perf top -C 0 is great.

I did have an "a ha!" moment when the libspa-audioconvert.so kept being near the top and just happened to open pw-top and noticed that it was doing samplerate conversions everywhere and one microphone having 64 (!) channels. Some alsa config (and an USDX hardcoding) later, and compared to those screenshots from three days ago, total CPU usage has been reduced like 15-20%.

It's still using quite a lot of CPU of course, but now most of it seems to come from recording and rendering (both of which are to be expected), but at least I can actually scroll in other applications again if the editor is open. I suspect there's some big gains to be made by using some kind of rendering buffer for mostly static stuff like menu buttons, but that's for another time.

Before I close this ticket I would like to condense some of the findings on the wiki (the pw-top/resampling stuff) and in some markdown file (the profiling stuff). I'll try to get around to it tomorrow but it might be a few weeks. I'll assign it to myself in the meantime.