LMMS / lmms

Cross-platform music production software
https://lmms.io
GNU General Public License v2.0
8.18k stars 1.01k forks source link

Inline `TimePos` and `TimeSig` functions to improve performance #7549

Closed sakertooth closed 3 weeks ago

sakertooth commented 1 month ago

Inline the functions within TimePos and TimeSig to improve performance. I discovered this performance issue as I was profiling a project that had ~36K+ notes in it, running at 999 BPM.

In my discovery to figure why this project uses up so much CPU really, I tweaked the code inside InstrumentTrack::play a bit (this function was of interest since it was the one generating the NotePlayHandles) and removed the two while loops. CPU was of course 0% here, but what I found interesting was that adding the loop back in that iterates over all the notes and doing some small computation with the Note::pos function was taking around 20-30% of CPU alone, and doing this for other functions like Note::getVolume kept the CPU at 0%. I eventually realized with some profiling that the functions in TimePos have a lot of function call overhead, and inlining the TimePos::operator int() function made the CPU meter reach 0% again.

These are the changes I made when investigating the performance inside the InstrumentTrack::play function:

...
// if( cur_start > 0 )
// {
// // skip notes which are posated before start-bar
// while( nit != notes.end() && ( *nit )->pos() < cur_start )
// {
// ++nit;
// }
// }
for (const auto& note: notes)
{
played_a_note = note->pos() > 0; // the meaningless computation
// const auto currentNote = *nit;
...
}
...
return played_a_note;

I've attached a screenshot of the CPU meter after removing most of the note audio generation code in InstrumentTrack::play, and instead only doing a small amount of meaningless computation (master branch, Release build): Pasted image 20241017221119

After inlining the functions inside TimePos: Pasted image 20241017221427

In addition, this seems to have correlated greatly with the number of cache misses during execution. After inlining these functions, the cache miss amount has dropped significantly, most likely because there are less instructions to fetch and to deal with. I discovered this using perf and it was how I realized that something odd was happening with the TimePos::operator int() function.

perf report results from perf record -e cache-misses -p $(pidof lmms) -- sleep 5s before inlining (you will most likely see cache misses coming from a lot of other places in normal instances): Pasted image 20241017222429

Same perf report after inlining: Pasted image 20241017222721