Closed zherczeg closed 2 months ago
I have some ideas to further improve this patch.
I have updated the code. @PhilipHazel what do you think?
Very nice idea! I am happy with it. Is it ready for merging now?
It is ready. The code is fail safe, although it would be great if I understand the purpose of the extra increase.
Currently when utf caseless matching is requested, each character in a class range is checked one-by-one to find their other cases. I always thought this is inefficient, even if it is only done by the parser.
To speed things up, I have added a data structure, which contains ranges where characters have no other cases. The ranges have a minimum size. The size in uint32_t units with different minimum sizes (the number of ranges is half of that size):
I choose 8 at the end, because only 74 values are needed, and these ranges cover 1110983 characters, which is 99.7% of all utf characters.
Performance improvement by the patch:
Original (-O3):
New (-O3)
The new one is 107 times faster than the old, I think this is a nice speedup.