PrefectHQ / prefect

Prefect is a workflow orchestration framework for building resilient data pipelines in Python.
https://prefect.io
Apache License 2.0
15.77k stars 1.54k forks source link

PREFECT_LOGGING_EXTRA_LOGGERS not working for "joblib" #12428

Closed trahloff closed 5 months ago

trahloff commented 5 months ago

First check

Bug summary

Using config flag PREFECT_LOGGING_EXTRA_LOGGERS the like outline in the documentation, does not work for the Python package "joblib".

Reproduction

  1. Set PREFECT_LOGGING_EXTRA_LOGGERS=joblib
  2. pip install joblib
  3. Run script:
from prefect import flow, task
from joblib import Parallel, delayed

def sqrt(x):
    print(x**0.5)

@task
def print_joblib():
    Parallel(n_jobs=2)(delayed(sqrt)(i**2) for i in range(4))

@task
def print_directly():
    print("This is a print statement.")

@flow(log_prints=True)
def main() -> None:
    print_directly()
    print_joblib()

if __name__ == "__main__":
    main()

Error

=> Logs produced from function calls parrallized by joblib show up in the terminal, but not in the UI Screenshot_2024-03-26_atTime_14-50-54@2x Screenshot_2024-03-26_atTime_14-52-28@2x

Versions

Version:             2.16.0
API version:         0.8.4
Python version:      3.10.13
Git commit:          17f42e9d
Built:               Thu, Feb 22, 2024 3:45 PM
OS/Arch:             darwin/arm64
Profile:             liveeo-dev-dataops-trahloff
Server type:         cloud

Additional context

N/A

zzstoatzz commented 5 months ago

hi @trahloff - thanks for the issue!

its not clear to me from the reproduction that joblib is using a native python logger here. at a glance, it looks like joblib may be defining their own idea of a logger here, which would not automatically be respected by the PREFECT_LOGGING_EXTRA_LOGGERS setting

you likely know more than me about joblib, but can you confirm that it is or is not using a native python logger?

trahloff commented 5 months ago

Hi @zzstoatzz, thank you for the quick response! Yes and no from my perspective.

They build their own Logger class, however, they "just" use the native Python logger object to construct their logger object: https://github.com/joblib/joblib/blob/6310841f66352bbf958cc190a973adcca611f4c7/joblib/logger.py#L80.

That should normally work, right?

btw, I also tried to find a workaround by overriding the logger, or setting it manually via from prefect import get_run_logger in the sqrt method. However, I get a prefect.exceptions.MissingContextError: There is no active flow or task run context. in this case

zzstoatzz commented 5 months ago

here's how we gather extra loggers if you'd like to work on a PR that makes joblib compatible without breaking the prefect status quo, but I don't think this is a bug with prefect as is - since the reason it doesn't work appears to be because joblib "does python logging" in a non-standard way (it seems like Parallel inherits from Logger? not so sure what's going on)

any qualms with me closing this issue? I'm happy to assist as far as a workaround goes

zzstoatzz commented 5 months ago

closing this issue - please let me know if you think there's a prefect-specific bug here!