ArthurSonzogni / FTXUI

:computer: C++ Functional Terminal User Interface. :heart:
MIT License
6.92k stars 417 forks source link

Performance improvement of IsCombining #705

Closed clement-roblot closed 1 year ago

clement-roblot commented 1 year ago

Hi everyone,

In my tests using perf I see that between 30% and 40% of the CPU used by my app (lightly modified downloader example) is spent in the function IsCombining. This feels like a good spot to focus on to improve the speed of the lib.

What this function does is a binary search for a character inside of a massive table (g_word_break_intervals of 1288 elements). Once found we simply test if the interval for our character is WBP::Extend or not.

My idea was at first to add a caching layer to this long search but I only made things worse regarding performance. So the new idea is to refactor this giant g_word_break_intervals table to match what we need here: is my character WBP::Extend or not.

To do that we can re-generate this table to list intervales and a boolean (true for WBP::Extend and false for anything else). Plus we can merge together all the contiguous intervals having the same value. That way the search will be done on a drastically smaller table which should improve performance.

@ArthurSonzogni I assume you built some sort of a script to generate this table originally (I couldn´t find anything like that in the repo). I could modify this script to generate the table that I am describing.

PS: this giant table is also used in Utf8ToWordBreakProperty where we actually need the table as it is right now. So this change that I am proposing would be improving the CPU performances but it will not reduce the memory footprint of the lib.

ArthurSonzogni commented 1 year ago

Hi @clement-roblot, Indeed this function is on the hot path.

Before doing investigations, could you double check this has not been fixed by: https://github.com/ArthurSonzogni/FTXUI/pull/691 already?

Fixing this mistake was a huge improvement for compilers not able to optimize this away.

clement-roblot commented 1 year ago

Looked at #691 and I can imagine this made a big difference in the performance. What was done there was to pass the array we search the character into as a reference thus saving a copy of this giant array being done at every character being displayed on the screen.

The changes that I suggest are to reduce the size of this array to make the search faster. So to answer your question: no what I suggest is different from #691.

StefanRvO commented 1 year ago

I'm also thinking there must be something that can be done to optimize this. The changes in #691 Was mostly to take away the largest performance bottleneck i was seeing.

A possibility could be to just make the table a giant lookup table instead of intervals? That would require a one 1MB as far as i can see? But maybe that would thrash cache hits, and make the performance worse.

I think we should start with merging contiguous intervals. I don't see a reason why fx. image are separate intervals? Wouldn't it just be a positive to merge them into one?

StefanRvO commented 1 year ago

With this quick python script https://pastebin.com/3NJunjS7 i find that combining contiguous intervals reduces the table size to 993. Probably won't give a very big difference in performance, but would probably still be worthwhile to do as a first step.

StefanRvO commented 1 year ago

@clement-roblot Can you take a look at #713 ? Is that somewhat similar to what you had in mind?

Also tried the idea with just having a giant lookup table (https://github.com/StefanRvO/FTXUI/commit/fe34672538bdd78a2c3577e40df79fa0cc520b27)

Yields roughly similar performance improvements as #713, but also increase memory usage by 1MB, so i prefer the solution in the pr

StefanRvO commented 1 year ago

Should we close this, or do you have ideas for how IsCombining can be speed up further?

clement-roblot commented 1 year ago

Sounds good to me to close this now. I'll open a new ticket if I get further ideas 😉

ArthurSonzogni commented 1 year ago

Yes, we can close this. Thanks for the works improving performance!