dask / distributed

A distributed task scheduler for Dask
https://distributed.dask.org
BSD 3-Clause "New" or "Revised" License
1.58k stars 720 forks source link

Old/simple/default logging config is opaque and hard to externally-configure #2659

Open javabrett opened 5 years ago

javabrett commented 5 years ago

This issue might become either a doc-improvement one, or possibly lead to some non-backwards-compatible change in defaults.

Summary, TL;DR

Dask distributed (and specifically client-side) logging should behave roughly according to Configuring Logging for a Library, avoiding application-level logging-configuration decisions. The default/absent logging configuration does not do this and configures a custom (flat, non-hierarchical) logging structure with specific Handlers. There is an option to use the newer logging dictConfig schema, but this brings other considerations.

Detail

I have an application which loads a number of libraries, including "foo", and the Dask distributed Client. The expectation I can easily configure application logging externally, including logging from foo and (Dask) distributed and any other library. Providing they are all good library logging citizens then this should be easy and not require special knowledge. In actuality I need to treat the Dask distributed client library specifically, and have special knowledge of its logging-configuration, so it can't be treated globally as just another library.

Imagine we are running in an application's config.py - or we might be configuring logging in a notebook via ipython_kernel_config.py. The following could also be transposed into the logging module YAML schema. Testing in daskdev/dask:latest (docker run -it --rm daskdev/dask python:

>>> import logging
>>> from logging import StreamHandler, Formatter

We want all loggers to have a standard format, so we can change the formatter on the root logger's handler:

>>> root_handler = logging.StreamHandler()
>>> root_handler.setFormatter(Formatter("[%(levelname)1.1s %(asctime)s.%(msecs).03d %(name)s %(module)s:%(lineno)d] %(message)s"))
>>> logging.root.handlers = [root_handler]

Test what the foo logger output will look like:

>>> foo_logger = logging.getLogger("foo")
>>> foo_logger.warning("foo warning")
[W 2019-05-05 22:53:43,628.628 foo <stdin>:1] foo warning

All good so far. Check Dask distributed logging:

>>> import distributed
>>> distributed_logger = logging.getLogger("distributed")
>>> distributed_logger.warning("distributed warning")
distributed - WARNING - distributed warning

Not the format we are after. At this point we check distributed log config/init code and also check:

>>> distributed_logger.handlers
[<StreamHandler <stderr> (NOTSET)>]
>>> distributed_logger.propagate
False
>>> distributed_logger.handlers[0] == logging.root.handlers[0]
False

So the distributed logger is a pseudo-root logger and its handlers and formatters cannot be configured globally if-desired via the root logger.

>>> distributed_logger.handlers = []
>>> distributed_logger.propagate = True
>>> distributed_logger.warning("distributed warning")
[W 2019-05-05 23:02:09,853.853 distributed <stdin>:1] distributed warning

Additional analysis

Going further, you notice that using the old/simple/default logging-config mechanism, you don't actually have a hierarchy of loggers at all - it's been flattened using propagate = False and local Handlers on any pre-configured or default loggers:

>>> distributed_client_logger = logging.getLogger("distributed.client")
>>> distributed_client_logger.warning("distributed.client warning")
distributed.client - WARNING - distributed.client warning

... still has the old format.

>>> distributed_client_logger.handlers = []
>>> distributed_client_logger.propagate = True
>>> distributed_client_logger.warning("distributed.client warning")
[W 2019-05-05 23:11:18,901.901 distributed.client <stdin>:1] distributed.client warning

... to fix.

Things become more complicated when running some non-standard logging. A real-world example is when running IPython notebooks in a Docker container. The logging here is complex - the kernel process root logger is actually outputting to the running notebook (so via Tornado into the browser). If I want certain libraries to do Docker-style logging to stderr, I need to do some work on the root logger and its handlers. This works for all libraries, but then requires additional work for Dask distributed client, since it assumes that its default StreamHandler should go to stderr.

What about logging schema-based logging?

It is possible to switch to the newer logging schema-based logging, by adding a version = 1 key to the Dask logging config. However:

Possible changes

One or more of:

Related issues/PRs

javabrett commented 5 years ago

I also note that there is distributed.admin.log-format available.

javabrett commented 5 years ago

disable_existing_loggers was fixed in #1380 https://github.com/dask/distributed/commit/f835f8f1a808ac71dcc629e1126c90bdd3aa2599 .

mrocklin commented 5 years ago

First, let me apologize that I haven't read this thoroughly yet, and at the same time thank you for taking the time to write about it. Mostly I'm skipping this because I think that it will take some time to read, digest, and consider thoughtfully.

Second, I'm happy to acknowledge that we can do logging better. Help would be welcome in thinking about this. I'm hoping that you're willing to become Dask's logging Czar @javabrett .

Third, let me add some constraints. Dask is used both as an infrastructure library (where logging conventions and cleanliness are paramount) and also as an interactive library by users who don't know anything about logging. The second set of users vastly out-number the first. This adds some complexity to our life. I think that primarily I care that the following produces no visual output, without any configuration from the user

$ pip install dask[complete]
>>> from dask.distributed import Client
>>> client = Client()
>>> exit()

Past that, I don't have strong thoughts on the matter and would be happy for other people to take ownership (assuming that they're also going to continue to own responsibility for the changes after they are merged) except for a loose preference to deprecate things slowly.

mrocklin commented 5 years ago

@jcrist might have thoughts here