Open javabrett opened 5 years ago
I also note that there is distributed.admin.log-format
available.
disable_existing_loggers
was fixed in #1380 https://github.com/dask/distributed/commit/f835f8f1a808ac71dcc629e1126c90bdd3aa2599 .
First, let me apologize that I haven't read this thoroughly yet, and at the same time thank you for taking the time to write about it. Mostly I'm skipping this because I think that it will take some time to read, digest, and consider thoughtfully.
Second, I'm happy to acknowledge that we can do logging better. Help would be welcome in thinking about this. I'm hoping that you're willing to become Dask's logging Czar @javabrett .
Third, let me add some constraints. Dask is used both as an infrastructure library (where logging conventions and cleanliness are paramount) and also as an interactive library by users who don't know anything about logging. The second set of users vastly out-number the first. This adds some complexity to our life. I think that primarily I care that the following produces no visual output, without any configuration from the user
$ pip install dask[complete]
>>> from dask.distributed import Client
>>> client = Client()
>>> exit()
Past that, I don't have strong thoughts on the matter and would be happy for other people to take ownership (assuming that they're also going to continue to own responsibility for the changes after they are merged) except for a loose preference to deprecate things slowly.
@jcrist might have thoughts here
This issue might become either a doc-improvement one, or possibly lead to some non-backwards-compatible change in defaults.
Summary, TL;DR
Dask
distributed
(and specifically client-side) logging should behave roughly according to Configuring Logging for a Library, avoiding application-level logging-configuration decisions. The default/absent logging configuration does not do this and configures a custom (flat, non-hierarchical) logging structure with specificHandler
s. There is an option to use the newerlogging
dictConfig
schema, but this brings other considerations.Detail
I have an application which loads a number of libraries, including "foo", and the Dask
distributed
Client
. The expectation I can easily configure application logging externally, including logging fromfoo
and (Dask)distributed
and any other library. Providing they are all good library logging citizens then this should be easy and not require special knowledge. In actuality I need to treat the Daskdistributed
client library specifically, and have special knowledge of its logging-configuration, so it can't be treated globally as just another library.Imagine we are running in an application's
config.py
- or we might be configuring logging in a notebook viaipython_kernel_config.py
. The following could also be transposed into thelogging
module YAML schema. Testing indaskdev/dask:latest
(docker run -it --rm daskdev/dask python
:We want all loggers to have a standard format, so we can change the formatter on the root logger's handler:
Test what the
foo
logger output will look like:All good so far. Check Dask
distributed
logging:Not the format we are after. At this point we check
distributed
log config/init code and also check:So the
distributed
logger is a pseudo-root logger and its handlers and formatters cannot be configured globally if-desired via the root logger.Additional analysis
Going further, you notice that using the old/simple/default logging-config mechanism, you don't actually have a hierarchy of loggers at all - it's been flattened using
propagate = False
and localHandler
s on any pre-configured or default loggers:... still has the old format.
... to fix.
Things become more complicated when running some non-standard logging. A real-world example is when running IPython notebooks in a Docker container. The logging here is complex - the kernel process root logger is actually outputting to the running notebook (so via Tornado into the browser). If I want certain libraries to do Docker-style logging to
stderr
, I need to do some work on the root logger and its handlers. This works for all libraries, but then requires additional work for Daskdistributed
client, since it assumes that its defaultStreamHandler
should go tostderr
.What about
logging
schema-based logging?It is possible to switch to the newer
logging
schema-based logging, by adding aversion = 1
key to the Dask logging config. However:disable_existing_loggers = False
will be required in order to prevent Dask client logging configuration from obliterating any/all other application logging configuration that preceded it.Possible changes
One or more of:
disable_existing_loggers = False
.Related issues/PRs
618 distributed hijacks the logger and 08c6c67a9258316b08fbf5aa3453a3af8cd1c72b is the change where the loggers were flattened and provided individual
Handler
s.1381 logging configuration turns off existing loggers