getpelican / pelican

Static site generator that supports Markdown and reST syntax. Powered by Python.
https://getpelican.com
GNU Affero General Public License v3.0
12.57k stars 1.81k forks source link

Ability to filter log messages by regular expression #2893

Open logological opened 3 years ago

logological commented 3 years ago

Feature Request

Pelican has a LOG_FILTER setting that allows warnings and other logging messages to be suppressed based on the exact text of the message, or on a fixed template. Instead of (or in addition to) filtering by fixed templates, it would be nice if messages could be filtered by regular expression. Among other things, this would provide a better fix for Issue #2398 by allowing warning messages to be filtered on a per-image or per-page basis.

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your participation and understanding.

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your participation and understanding.

justinmayer commented 2 years ago

While I can see the potential value in filtering logs via regular expressions, I'm not quite sure how many folks would actually use such a feature. Anyone out there want this feature enough to be willing to work on implementing it?

sonologic commented 1 year ago

I actually just ran into a desire to have this as well. I've added https://pypi.org/project/pelican-advthumbnailer/ to my site and it is spouting a lot of warnings that can be safely ignored, but since they include full filesystem paths, the exact message depends on where the site content lives. I'm willing to put in the work to make this happen. I'd propose to add a setting LOG_FILTER_REGEXP of type List[Tuple[int, str]] that contains 'log level', 'regular expression' tuples. Would that be a valid approach? As for the implementation, I'd probably want to pre-compile the regular expressions when initialising the Filter object and then iterate over the regular expressions when the filter is invoked.

justinmayer commented 1 year ago

Hi Koen. Thanks for offering to assist with this feature. Rather than adding a new setting, perhaps the existing setting could be extended to include the filter type, defaulting to string unless otherwise specified? Maybe something like:


LOG_FILTER = [
    ("string", logging.WARN, "TAG_SAVE_AS is set to False"),
    ("regex", logging.WARN, 'r"^TAG_SAVE_AS'),
]

What do you think?
sonologic commented 1 year ago

Hi @justinmayer , My only objection would be about backwards compatibility. If we do this, people who already have LOG_FILTER in their pelicanconf.py need to adapt theirs unless we do a pass over the filter and convert any 2-tuple to a 3-tuple and maybe throw a FutureWarning. But maybe that is what you meant with 'defaulting to string unless otherwise specified'. I think it makes a lot of sense to do it like that! Cheers, Koen

justinmayer commented 1 year ago

That is indeed what I intended. 😊

sonologic commented 1 year ago

@justinmayer I've created an MR, but it's waiting for maintainer approval before the ci can run. If you have a moment, perhaps you can approve?

https://github.com/getpelican/pelican/actions/runs/4403482780

justinmayer commented 1 year ago

@sonologic: I posted a comment to that PR. Let's move this discussion there.

copperchin commented 1 year ago

Sorry if this is the wrong place to put this, but rather than change the length of the tuple in LOG_FILTERs values, what about using re.compile when regex is desired?

So converting the example above:

LOG_FILTER = [
    (logging.WARN, "TAG_SAVE_AS is set to False"),
    (logging.WARN, re.compile("^TAG_SAVE_AS"),
]

This would prevent old settings from breaking in new versions of Pelican.

avaris commented 1 year ago

@copperchin Do you mean we expect users to provide compiled regular expressions? And handle it internally with an isinstance check? It would work, technically, but I worry about the "user-friendliness" a bit. Seems it would be easier to forget re.compileing them and not get expected results.

Also, changing 2-tuple to 3-tuple can be handled in a non-breaking way. It is relatively easy to issue a DeprecationWarning if 2-tuple is seen and migrate to 3-tuple while reading the settings.

copperchin commented 1 year ago

@copperchin Do you mean we expect users to provide compiled regular expressions? And handle it internally with an isinstance check? It would work, technically, but I worry about the "user-friendliness" a bit. Seems it would be easier to forget re.compileing them and not get expected results.

It seems reasonable to assume that if :

  1. the user is actively editing a python "config" script
  2. the user is aware of regex (in that they know it solves the problem they have)

... then they know they need to mark the regex as such.

I don't think using re.compile(regex_str) is any bigger a hurdle than the already-required practice of specifying a logging level (instead of a string). In any case, this new feature ought to be documented, wherein presenting the use of re.compile should be fairly straightforward.

It also has the added benefit of being immediately identifiable as regex in any syntax highlighter πŸ‘


Also, changing 2-tuple to 3-tuple can be handled in a non-breaking way. It is relatively easy to issue a DeprecationWarning if 2-tuple is seen and migrate to 3-tuple while reading the settings.

This is only true in one direction.

Keeping the 2-tuple format means configs will only break older versions of pelican if attempting to use the new (unsupported) feature.

Furthermore, keeping the 2-tuple means updating to a newer version of pelican won't require updating existing configs.

...

On the other hand, changing the config format to 3-tuples requires rewriting the configs, which will immediately break on older versions pelican, regardless of whether they are trying to use the new feature or not. This would be particularly annoying if needing to roll back to previous version of pelican.

avaris commented 1 year ago

I didn't mean it as a hurdle. Knowing what should be done and actually remembering to do it are different things :).

On the other hand, changing the config format to 3-tuples requires rewriting the configs

No, with the way I described, it will still work but you'll get a pesky warning about it. And in my experience, people don't stick with a version long enough to port everything over before doing a roll back. They usually hit some deal breaker pretty early (immediately?). So, compatibility in the other direction is less of a concern mostly.

Anyway, I don't have a strong preference in either option :). In technical terms, we can make both work.