Closed zherczeg closed 2 months ago
Timing. Old:
re> /[\x{100}-\x{400}],[\x{100}-\x{300}],[\x{200}-\x{600}]/i,utf
Compile time 70.0491 microseconds
New:
re> /[\x{100}-\x{400}],[\x{100}-\x{300}],[\x{200}-\x{600}]/i,utf
Compile time 42.9079 microseconds
Actually I wanted this caching from the beginning, but the original patch was complex enough, and decided to do it later.
Thie jit changes are just compile simplification, its effect is negligible, but the code looks better, so it is worth it.
I want to add logarithmic search for jit, but I am not sure it is worth for the interpreter. I can do it, but is it worth it? Good question.
Note: on EBCDIC systems, when utf is disabled, we don't optimize ranges. I suspect EBCDIC (without utf) only used in 8 bit mode anyway, so it should not be a problem.
Yes, EBCDIC is purely an 8-bit encoding.it shouldn't ever involve XCLASS.
It seems we don't test EBCDIC. I am not even sure it is possible. Anyway, I have added a #error
where the support needs to be added. Should be an easy task, but without testing it is not worth much.
Do we also need updates to HACKING?, There is Ze'ev in the mailing list that has access to an EBCDIC system, and maintains the port for it who might be interested, but AFAIK will need this merged and a snapshot to do so.
My plan is also start a discussion about ebcdic support after the work is landed. I have added some words to hacking, but I am not a native speaker so feel free to improve it.
This should be enough for one patch. Could you check it?
The new method for processing ranges allows some optimizations in the code, e.g. the xclass processing can return early. Furthermore, the data-set created for each range is cached, and during the second pass of byte code generation the cache is used.