getpelican / pelican

Static site generator that supports Markdown and reST syntax. Powered by Python.
https://getpelican.com
GNU Affero General Public License v3.0
12.37k stars 1.8k forks source link

Plugin name disappeared from log messages #3353

Open egberts opened 2 days ago

egberts commented 2 days ago

Issue

the plugin name disappeared from its log message, albeit 3-years ago in this patch. And not only did name field disappear from the log format, the name field got hard coded with 'py.warning' for all Pelican logs.

Rationale

Having a path/filename column alone is quite the rather useless if many plugins share the same filename (of which THEY ARE ALLOW TO SO, such as __init__, __main__, init), so plug-in name would be very useful addition (that we used to have) to distinguish apart the same common filenames between Pelican and also between MANY plugins.

name field item of this logging module contains whatever it is declared in logger = logging.getLogger('name'). Pelican starts it out withpelican` as a logger name.

Logger name:

  1. used primarily to reflect and separate "daemons/tasks", "subsystems", "submodules", and ... "plugins".
  2. Logger "provides" nesting of Python callstack as a default.
  3. We can repurpose this name to reflect our plugins along with no. 2 above.

Our Pelican plugin makes use of the logger = logging.getLogger(__name__) so that plugin-specific names can appear with the log message as well.

Is this something that we can quickly fix by introducing the (name) into this LOG_FORMAT?

Perhaps, like:


    LOG_FORMAT = "%(name)s: %(message)s"

so that we can get something like:

           DEBUG    pelican.plugins._utils: Loading plugin         _utils.py:104
                    `m.htmlsanity`                                              
           DEBUG    pelican.plugins._utils: Loading plugin         _utils.py:104
                    `tableize`                                                  
           DEBUG    pelican: Registering plugin `m.htmlsanity`    __init__.py:78
           DEBUG    pelican: Registering plugin                   __init__.py:78
                    `pelican.plugins.tableize`                                  
           INFO     pelican.plugins.tableize.tableize: tableize  tableize.py:584
                    plugin registered for Pelican, using new 4.0                
                    plugin variant                                              
           DEBUG    pelican.plugins.tableize.tableize:           tableize.py:249
                    tableize_pelican_initialized_all signal                     
           WARNING  pelican.plugins.tableize.tableize:           tableize.py:271
                    TABLEIZE_PLUGIN does not exist in                           
                    pelican.settings; plugin's built-in defaults       

Alternative Output of Plugin Name

Alternatively, we could prepend the plugin name before the filename like in:

           DEBUG    Loading plugin `m.htmlsanity`                 _utils.py:104/plugins._utils
           DEBUG    Loading plugin `tableize`                     _utils.py:104/plugins._utils
           DEBUG    Registering plugin `m.htmlsanity`             __init__.py:78/pelican
           DEBUG    Registering plugin `pelican.plugins.tableize` __init__.py:78/pelican.plugins.tableize
           INFO     tableize plugin registered for Pelican, using  tableize.py:584/pelican.plugins.tableize.tableize
                    new 4.0 plugin variant                                              
           DEBUG    tableize_pelican_initialized_all signal        tableize.py:249
           WARNING  TABLEIZE_PLUGIN does not exist in              tableize.py:271/pelican.plugins.tableize.tableize
                    pelican.settings; plugin's built-in defaults       

I've not determine how to implement this alternative, but I do believe @MinchinWeb can help us there, if this is the more desirable way to do this.

Alternative No. 2 - Shorter name

We could shorten this 'name' even further to just the last two dotted names or anything starting with plugin.:

Examples are:

Comment made on its commit in question here.

HOOK'EM LOG-NAME

BUG: py.warnings is the currently assigned log name (until I had my own pelican.py executable took over properly as pelican name.)

NOTE: Notice that m.sanity did not call logger = logging.setLogger(__name__)? It got the pelican because it is a legacy (non-4.0) plugin.

NOTE: Notice that using __name__ in logger = logging.setLogger(__name__) will get you the full multi-dotted name like pelican.plugins.tableize.tableize? but using your own string results like my_fancy_tableize results in a simple non-dotted my_fancy_tableize name? This multi-dotted nature can be disabled by log.py doing logger = logging.setLogger('pelican').

Declaring logger class variable as a global scope but within a plugin should be safe, scoping-wise:

# my plugin
import logging
# logger = logging.getLogger(__name__)
logger = logging.getLogger('my_fancy_plugin_name')

Platform Used

MinchinWeb commented 2 days ago

To preface everything else, let me just say that getting logging right is hard. The default terminal is often only 80 characters wide, and a large site can generate thousands of lines of logging as it is; the volume can make it hard to find the details you need.

For reference, the switch to logging with rich as PR #2897 .

Moving to the new (i.e. current) logging system I found was better for displaying where it was coming from, as it added the calling file to the right margin. I don't think the old system displayed filenames unless it hit an error (and I think the new system still does that).

As for the proposals, for No 1, I've implemented a variation of this with my own plugins, where I will prefix a shortname of the plugin to the log message is displays. The actual code is:

from .constants import LOG_PREFIX  # LOG_PREFIX = "[Microblog]"

logger = logging.getLogger(__name__)

def some_function():
            # the rest of the function

            logger.warning(
                '%s micropost "%s" longer than expected (%s > %s).'
                % (
                    LOG_PREFIX,
                    relative_filename,
                    post_len,
                    settings["MICROBLOG_MAX_LENGTH"],
                )

(code from https://github.com/MinchinWeb/minchin.pelican.readers.microblog/blob/master/minchin/pelican/readers/microblog/generator.py )

There may be a case for providing a Pelican hook for providing the logging prefix; I'm not sure what I think about trying to automatically determine one. In any case, I think logging from Pelican doesn't been to specifically be prefixed.

As for No 2, I think this is too much to put on the right margin, because it would make the main logging message take up many more lines on narrow (and even "normal") terminals.

egberts commented 2 days ago

As for No 2, I think this is too much to put on the right margin, because it would make the main logging message take up many more lines on narrow (and even "normal") terminals.

Well, for starter, we wasted 12 columns just for the timestamp, so there is something to be said about precious real estate space. Do we even need timestamp??? Honestly, Pelican is inherently a sequential processor, nothing special about multiple sub-tasking nor sub-processes. A single line having a single date timestamp is more than sufficient, for all sense and purposes:

datetime = <insert favorite time routine>
logger.info(datetime)

Heck, put two: one for start, one for end of Pelican session.

Also, doing EXISTING plugin development (oh let's say, upgrading to 4.0 plugin scheme), the filename is inherently useless for many.

During plugin development, this class-nesting of 'name' seems a lot more practical (in terms of development) than anything else (but not wanting to diminish the filename/line-number scheme here).

So, are we to be deigned in keeping it harder to do new (or old) plugin (re-)development?

egberts commented 2 days ago

There may be a case for providing a Pelican hook for providing the logging prefix; I'm not sure what I think about trying to automatically determine one. In any case, I think logging from Pelican doesn't been to specifically be prefixed.

Logger name:

  • used primarily to reflect and separate "daemons/tasks", "subsystems", "submodules", and ... --> "plugins". <--

Notice the word "plugins"?

It is already in place via LOG_FORMAT by using something like a string value like %(message)s: %(name)s, or %(name)s: %(message)s. This fix is essentially a one-liner fix. We could just leave it at that if we supplied this as a "helpful hint" toward a not-so-editable developer page.

The only reason why I looked into other column placement is because of excessive wraparounds, which we could fix perhaps by removing the timestamp column. Again, do we really, REALLY need a timestamp column? (asking for a co-worker).

Nothing I suggested would require changes to the existing code base for all plugins, just leveraging the logging to its fullest.

egberts commented 2 days ago

Don't get me wrong, I absolutely love the RichHandler stuff, just the timestamp, if made a configurable CLI option may make this easier to read longer output.

Maybe picking up the terminal width and passing that as well, but I dunno as to its extensibility of RichFormat, what do you think, @MinchinWeb ? Something like in pelican/log.py here:

terminal_width = 80
try:
    terminal_width = os.get_terminal_size()
    if terminal_width < 80:
        terminal_width = 80
except:
    # Must be a virtual terminal like PyCharm console or VSCode console
    terminal_width = 80

This working snippet of code actually works in both PyCharm/VSCode as well as xterm/terminal ....

This snippet may obviate the need to pursue any further logging change except for:

egberts commented 1 day ago

Also the official Python logging states that our LOG_FORMAT is defaulted to:

'%(levelname)s %(name)s %(message)s'

Use the specified format string for the handler. Defaults to attributes levelname, name and message separated by colons.

Reference

egberts commented 1 day ago

Perhaps we could support LOG_FORMAT at the configuration settings file?