Closed clement-roblot closed 1 year ago
Hi @clement-roblot, Indeed this function is on the hot path.
Before doing investigations, could you double check this has not been fixed by: https://github.com/ArthurSonzogni/FTXUI/pull/691 already?
Fixing this mistake was a huge improvement for compilers not able to optimize this away.
Looked at #691 and I can imagine this made a big difference in the performance. What was done there was to pass the array we search the character into as a reference thus saving a copy of this giant array being done at every character being displayed on the screen.
The changes that I suggest are to reduce the size of this array to make the search faster. So to answer your question: no what I suggest is different from #691.
I'm also thinking there must be something that can be done to optimize this. The changes in #691 Was mostly to take away the largest performance bottleneck i was seeing.
A possibility could be to just make the table a giant lookup table instead of intervals? That would require a one 1MB as far as i can see? But maybe that would thrash cache hits, and make the performance worse.
I think we should start with merging contiguous intervals. I don't see a reason why fx. are separate intervals? Wouldn't it just be a positive to merge them into one?
With this quick python script https://pastebin.com/3NJunjS7 i find that combining contiguous intervals reduces the table size to 993. Probably won't give a very big difference in performance, but would probably still be worthwhile to do as a first step.
@clement-roblot Can you take a look at #713 ? Is that somewhat similar to what you had in mind?
Also tried the idea with just having a giant lookup table (https://github.com/StefanRvO/FTXUI/commit/fe34672538bdd78a2c3577e40df79fa0cc520b27)
Yields roughly similar performance improvements as #713, but also increase memory usage by 1MB, so i prefer the solution in the pr
Should we close this, or do you have ideas for how IsCombining can be speed up further?
Sounds good to me to close this now. I'll open a new ticket if I get further ideas 😉
Yes, we can close this. Thanks for the works improving performance!
Hi everyone,
In my tests using
perf
I see that between 30% and 40% of the CPU used by my app (lightly modified downloader example) is spent in the functionIsCombining
. This feels like a good spot to focus on to improve the speed of the lib.What this function does is a binary search for a character inside of a massive table (
g_word_break_intervals
of 1288 elements). Once found we simply test if the interval for our character isWBP::Extend
or not.My idea was at first to add a caching layer to this long search but I only made things worse regarding performance. So the new idea is to refactor this giant
g_word_break_intervals
table to match what we need here: is my characterWBP::Extend
or not.To do that we can re-generate this table to list intervales and a boolean (true for
WBP::Extend
and false for anything else). Plus we can merge together all the contiguous intervals having the same value. That way the search will be done on a drastically smaller table which should improve performance.@ArthurSonzogni I assume you built some sort of a script to generate this table originally (I couldn´t find anything like that in the repo). I could modify this script to generate the table that I am describing.
PS: this giant table is also used in
Utf8ToWordBreakProperty
where we actually need the table as it is right now. So this change that I am proposing would be improving the CPU performances but it will not reduce the memory footprint of the lib.