Closed bbshelper closed 3 months ago
Well, this doesn't take into account language specific overrides in linebreakdef.c
...
Luckily these overrides with LBP_OP
all have N or A east asian width, and the return value of op_is_east_asian()
are correct with or without these overrides. I've added a note.
I like the optimization, but it is better to keep the code future-safe, and also be sure to follow the Python code style.
Getting east asian width of every char is overkill, causing process time more than doubled. For LB30, the width is only needed when LBP is OP or CP, and it can be further improved by precomputing the ranges.
According to the test (make check
) speed, the slow-down is not that much. The performance hit is more severe in your use case?
According to the test (
make check
) speed, the slow-down is not that much. The performance hit is more severe in your use case?
I noticed the performance issue during profiling for koreader (an ebook reader). It uses lb_process_next_char()
on every char. My report is based on the time spent in this function.
make check
on my machine is ~7.0 vs ~7.3ms, but the timing also includes other things like fgets()
. If I wrap set_linebreaks_utf32 with a 1000 times loop, the result becomes ~570ms vs ~720ms.eaw_prop
and thus a worst case scenario for binary search.
Getting east asian width of every char is overkill, causing process time more than doubled. For LB30, the width is only needed when LBP is OP or CP, and it can be further improved by precomputing the ranges.