dask / distributed

A distributed task scheduler for Dask
https://distributed.dask.org
BSD 3-Clause "New" or "Revised" License
1.57k stars 718 forks source link

Do not initialize logging on import #8634

Open fjetter opened 4 months ago

fjetter commented 4 months ago

Initializing logging on import has many unwanted side-effects.

Most importantly, it is very difficult to configure / overwrite anything unless the config file is overwritten directly. This should delay log configuration until it is needed. This allows code like this to work

import logging

import dask
import dask.bag as db

from dask.distributed import Client

with dask.config.set({
    "logging": {
        "custom": "info",
        "distributed": "warning",  # Default is INFO which is a little verbose
    }
}):
    client = Client(silence_logs=False)

logger = logging.getLogger("custom")
logger.setLevel(logging.INFO)

def task(n: int):
    logger.info(f"Hello {n}")

bag = db.from_sequence([1,2,3])
bag.map(task).compute()

which just prints

2024-05-06 12:55:07,779 - matt - INFO - Hello 1
2024-05-06 12:55:07,779 - matt - INFO - Hello 3
2024-05-06 12:55:07,779 - matt - INFO - Hello 2
fjetter commented 4 months ago

I likely have to go through a couple of edge cases and adjust some tests for this to work. I haven't tried how the silence_logs kwarg factors in but overall I think this change is a strictly positive improvement

github-actions[bot] commented 4 months ago

Unit Test Results

_See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests._

    29 files  ±    0      29 suites  ±0   10h 57m 8s :stopwatch: + 1h 14m 12s  4 051 tests  -     5   3 951 :white_check_mark: +    9     97 :zzz:  -   9  3 :x:  - 1  55 799 runs  +7 577  53 634 :white_check_mark: +7 347  2 161 :zzz: +248  4 :x:  - 1 

For more details on these failures, see this check.

Results for commit 22608665. ± Comparison against base commit e4a05450.

This pull request removes 13 and adds 8 tests. Note that renamed tests count towards both. ``` distributed.protocol.tests.test_arrow distributed.protocol.tests.test_collection distributed.protocol.tests.test_highlevelgraph distributed.protocol.tests.test_numpy distributed.protocol.tests.test_pandas distributed.shuffle.tests.test_graph distributed.shuffle.tests.test_merge distributed.shuffle.tests.test_merge_column_and_index distributed.shuffle.tests.test_metrics distributed.shuffle.tests.test_rechunk … ``` ``` distributed.diagnostics.tests.test_memray ‑ test_basic_integration_scheduler distributed.diagnostics.tests.test_memray ‑ test_basic_integration_scheduler_report_args[False] distributed.diagnostics.tests.test_memray ‑ test_basic_integration_scheduler_report_args[report_args0] distributed.diagnostics.tests.test_memray ‑ test_basic_integration_workers[1] distributed.diagnostics.tests.test_memray ‑ test_basic_integration_workers[False] distributed.diagnostics.tests.test_memray ‑ test_basic_integration_workers[True] distributed.diagnostics.tests.test_memray ‑ test_basic_integration_workers_report_args[False] distributed.diagnostics.tests.test_memray ‑ test_basic_integration_workers_report_args[report_args0] ```

:recycle: This comment has been updated with latest results.

fjetter commented 1 month ago

I checked the behavior again. If silence_logs=True we're still setting all the loggers to silent but with silent_logs=False it is respecting the config set in the dask.config ctx manager. I think this behavior makes sense.