Textualize / rich

Rich is a Python library for rich text and beautiful formatting in the terminal.
https://rich.readthedocs.io/en/latest/
MIT License
49.04k stars 1.71k forks source link

[REQUEST] Rich Should Accept Highlights as re.compiled re.Patterns and Use them Internally #3345

Open PyWoody opened 5 months ago

PyWoody commented 5 months ago

Rich should take advantage of the potential speed increases through compiled regular expressions in the re.compile function in the stdlib re module.

I have created a fork here: https://github.com/PyWoody/rich/tree/re_compiled that has the changes in place for demoing.

Using the EmailHighlighter example from the docs, a new Highlighter instance could be created like so

import re

from rich.console import Console
from rich.highlighter import RegexHighlighter
from rich.theme import Theme

class EmailHighlighter(RegexHighlighter):
    """Apply style to anything that looks like an email."""

    base_style = "example."
    highlights = [re.compile(r"(?P<email>[\w-]+@([\w-]+\.)+[\w-]+)")]

theme = Theme({"example.email": "bold magenta"})
console = Console(highlighter=EmailHighlighter(), theme=theme)
console.print("Send funds to money@example.org")

Note, the above example will already work in the default version because re.finditer automatically compiles a re.Pattern or string to a re.Pattern, as shown here: https://github.com/python/cpython/blob/3.12/Lib/re/__init__.py#L219, but it does not save it for re-use. The _compile function in re will do some caching automatically, as shown here: https://github.com/python/cpython/blob/3.12/Lib/re/__init__.py#L280, but it will be called every single time rich.text.Text.highlight_regex is called versus just saving the compiled version yourself.

The more regular expressions a Highlighter uses the more the re.Patterns will be cached, further allowing speed increases. For instance, the rich.highlighter.ISO8601Highlighter found updated here: https://github.com/PyWoody/rich/blob/re_compiled/rich/highlighter.py#L144, has a considerable speed increase compared to the default version.

The major caveat will be for custom Highlighters that use strings exclusively. There will be a marginal speed decrease in these situations as each call will need to be isinstanced checked and re.compiled on demand. This is evident in the highlight_regex method in rich.text.Text class found updated here: https://github.com/PyWoody/rich/blob/re_compiled/rich/text.py#L615. In my testing, the decrease was marginal enough to be difficult to extract a difference from the noise.

The net-net is basically using re.compile for default Highlighters is a free win, people that want to use re.compile in their custom highlighters get the speed boost, and existing Highlighters out in-the-wild or people that want to use strings exclusively only receive marginal speed decrease.

github-actions[bot] commented 5 months ago

Thank you for your issue. Give us a little time to review it.

PS. You might want to check the FAQ if you haven't done so already.

This is an automated reply, generated by FAQtory

willmcgugan commented 5 months ago

You're only changing when the regexes are compiled. Either you do it the first time you use it, or you do it at import time. Once compiled, there is going to be negligible differences between the two approaches.

I wouldn't want the builtin highlighters to use the pre-compiling approach, because startup-time for CLIs is a concern. But if you want to PR the change to highlight_regex to allow custom highlighters to pre-compile, I would accept that...

PyWoody commented 5 months ago

Hi Will,

Thanks for taking the time to review the issue and make a comment. The whole time I was doing the writeup I kept trying to figure out what I was missing and the startup for CLIs is definitely it. That makes complete sense.

I'll make the PR for highlight_regex when I have a chance. I'll add some basic comparison tests to see if it's worth it as well.

PyWoody commented 4 months ago

Hi @willmcgugan , sorry, I meant to update this thread but time got away from me a bit. I created the PR here: #3347. If there's anything else you need for the PR, please let me know and I'd be glad to give it a shot.