Closed maia closed 6 years ago
Thanks! I really appreciate all your hard work on these pull requests. Unfortunately, I am very busy at this time and won't have a chance to dig into your changes in detail for the foreseeable future. So, as all the tests pass, I am going to merge this and bump the version so that you and others can make use of the changes.
I look forward to studying and learning from your changes when I have time though and maybe I can help merge some of the similar regexes.
Thanks again!
This pull request focuses on the optimisation of regular expressions, and results in quite a speed increase. The prior pull request optimised memory but lowered speed, now we're back to the speed the gem had until a week ago, or probably a bit faster.
All regular expressions that are not language specific have been converted into constants of a single class, for the pragmatic reason of better identifying which are (more or less) duplicates of others. They have been optimised for speed by keeping them as concise as possible to match all tests, also they don't capture anything when not required, and multiple expressions are run as union.
There remain some similar regular expressions that cannot be merged as then either one or another spec does not pass. It seems to me that in these cases the specs need to be adjusted.
During this process large parts of the code have been updated and refactored. No more constants or method names with "unknown" in their names, and hopefully the code is much easier to understand as previously (which should help other future contributors). E.g. I was able to remove an entire (short) class and replace it with a few lines of code.
@diasks2 , please take a good look at the comments I've made in the code and when you have some spare time consider merging multiple similar regex (e.g. sometimes a character is kept, others are removed, and some regex only handle a very specific case and could be merged with others when instead handling an entire range of characters at once) and adjusting the specs accordingly. And: if you can find any way to reduce the number of chained
#flat_map
inPostProcessor
(by merging the three regular expressions), this should reduce the number of allocated objects.