Open PyWoody opened 5 months ago
You're only changing when the regexes are compiled. Either you do it the first time you use it, or you do it at import time. Once compiled, there is going to be negligible differences between the two approaches.
I wouldn't want the builtin highlighters to use the pre-compiling approach, because startup-time for CLIs is a concern. But if you want to PR the change to highlight_regex
to allow custom highlighters to pre-compile, I would accept that...
Hi Will,
Thanks for taking the time to review the issue and make a comment. The whole time I was doing the writeup I kept trying to figure out what I was missing and the startup for CLIs is definitely it. That makes complete sense.
I'll make the PR for highlight_regex
when I have a chance. I'll add some basic comparison tests to see if it's worth it as well.
Hi @willmcgugan , sorry, I meant to update this thread but time got away from me a bit. I created the PR here: #3347. If there's anything else you need for the PR, please let me know and I'd be glad to give it a shot.
Rich should take advantage of the potential speed increases through compiled regular expressions in the
re.compile
function in the stdlibre
module.I have created a fork here: https://github.com/PyWoody/rich/tree/re_compiled that has the changes in place for demoing.
Using the EmailHighlighter example from the docs, a new Highlighter instance could be created like so
Note, the above example will already work in the default version because
re.finditer
automatically compiles are.Pattern
or string to are.Pattern
, as shown here: https://github.com/python/cpython/blob/3.12/Lib/re/__init__.py#L219, but it does not save it for re-use. The_compile
function inre
will do some caching automatically, as shown here: https://github.com/python/cpython/blob/3.12/Lib/re/__init__.py#L280, but it will be called every single timerich.text.Text.highlight_regex
is called versus just saving the compiled version yourself.The more regular expressions a Highlighter uses the more the
re.Patterns
will be cached, further allowing speed increases. For instance, therich.highlighter.ISO8601Highlighter
found updated here: https://github.com/PyWoody/rich/blob/re_compiled/rich/highlighter.py#L144, has a considerable speed increase compared to the default version.The major caveat will be for custom Highlighters that use strings exclusively. There will be a marginal speed decrease in these situations as each call will need to be
isinstance
d checked andre.compile
d on demand. This is evident in thehighlight_regex
method inrich.text.Text
class found updated here: https://github.com/PyWoody/rich/blob/re_compiled/rich/text.py#L615. In my testing, the decrease was marginal enough to be difficult to extract a difference from the noise.The net-net is basically using
re.compile
for default Highlighters is a free win, people that want to usere.compile
in their custom highlighters get the speed boost, and existing Highlighters out in-the-wild or people that want to use strings exclusively only receive marginal speed decrease.