Inline the functions within TimePos and TimeSig to improve performance. I discovered this performance issue as I was profiling a project that had ~36K+ notes in it, running at 999 BPM.
In my discovery to figure why this project uses up so much CPU really, I tweaked the code inside InstrumentTrack::play a bit (this function was of interest since it was the one generating the NotePlayHandles) and removed the two while loops. CPU was of course 0% here, but what I found interesting was that adding the loop back in that iterates over all the notes and doing some small computation with the Note::pos function was taking around 20-30% of CPU alone, and doing this for other functions like Note::getVolume kept the CPU at 0%. I eventually realized with some profiling that the functions in TimePos have a lot of function call overhead, and inlining the TimePos::operator int() function made the CPU meter reach 0% again.
These are the changes I made when investigating the performance inside the InstrumentTrack::play function:
I've attached a screenshot of the CPU meter after removing most of the note audio generation code in InstrumentTrack::play, and instead only doing a small amount of meaningless computation (master branch, Release build):
After inlining the functions inside TimePos:
In addition, this seems to have correlated greatly with the number of cache misses during execution. After inlining these functions, the cache miss amount has dropped significantly, most likely because there are less instructions to fetch and to deal with. I discovered this using perf and it was how I realized that something odd was happening with the TimePos::operator int() function.
perf report results from perf record -e cache-misses -p $(pidof lmms) -- sleep 5s before inlining (you will most likely see cache misses coming from a lot of other places in normal instances):
Inline the functions within
TimePos
andTimeSig
to improve performance. I discovered this performance issue as I was profiling a project that had ~36K+ notes in it, running at 999 BPM.In my discovery to figure why this project uses up so much CPU really, I tweaked the code inside
InstrumentTrack::play
a bit (this function was of interest since it was the one generating theNotePlayHandle
s) and removed the twowhile
loops. CPU was of course 0% here, but what I found interesting was that adding the loop back in that iterates over all the notes and doing some small computation with theNote::pos
function was taking around 20-30% of CPU alone, and doing this for other functions likeNote::getVolume
kept the CPU at 0%. I eventually realized with some profiling that the functions inTimePos
have a lot of function call overhead, and inlining theTimePos::operator int()
function made the CPU meter reach 0% again.These are the changes I made when investigating the performance inside the
InstrumentTrack::play
function:I've attached a screenshot of the CPU meter after removing most of the note audio generation code in
InstrumentTrack::play
, and instead only doing a small amount of meaningless computation (master branch, Release build):After inlining the functions inside
TimePos
:In addition, this seems to have correlated greatly with the number of cache misses during execution. After inlining these functions, the cache miss amount has dropped significantly, most likely because there are less instructions to fetch and to deal with. I discovered this using
perf
and it was how I realized that something odd was happening with theTimePos::operator int()
function.perf report
results fromperf record -e cache-misses -p $(pidof lmms) -- sleep 5s
before inlining (you will most likely see cache misses coming from a lot of other places in normal instances):Same
perf report
after inlining: