dagster-io / dagster

An orchestration platform for the development, production, and observation of data assets.
https://dagster.io
Apache License 2.0
11.13k stars 1.4k forks source link

Allow configuration of dagster_daemon logging #4170

Open NakulK48 opened 3 years ago

NakulK48 commented 3 years ago

Use Case

For splunk ingestion purposes we're trying to assemble a log file containing log messages from all of our scheduled processes - and also any errors that might arise that prevent scheduled processes from being executed (e.g. errors in the load/the should_execute) function.

This gave us the idea of just using the dagster_daemon log file which seems to contain both of these things. Unfortunately it's not particularly configurable - it logs in local machine time (with no timezone) and our processes log in UTC with the +00:00 annotation which is apparently required for efficient parsing in splunk.

I didn't get an answer when I asked whether it was possible to customise it here https://github.com/dagster-io/dagster/discussions/4150

Ideas of Implementation

Logging appears to be set up here https://github.com/dagster-io/dagster/blob/master/python_modules/dagster/dagster/daemon/daemon.py#L23 - it would be cool if this could parse some options from dagster.yaml or something.

We'd need to be able to customise the date format and the converter, as here - the format itself is more of a nice-to-have:

FORMAT = "%(asctime)s [%(levelname)-8s] %(message)s"
DATE_FORMAT = r"%Y-%m-%d %H:%M:%S%z"

def setup_logging():
    """Configure the logger."""
    logging.basicConfig(format=FORMAT, datefmt=DATE_FORMAT)
    logging.Formatter.converter = time.gmtime
    logging.getLogger().setLevel(logging.INFO)

I'd be open to any sensible workarounds to get a single file containing both runs and (more importantly) execution failures


Message from the maintainers:

Excited about this feature? Give it a :thumbsup:. We factor engagement into prioritization.

gibsondan commented 3 years ago

Hi @NakulK48 - sorry for missing your Q&A post, our monitoring there isn't as good as it should be. Allowing a custom format string here for the daemon logger should be no problem.

Re: "execution failures" - one thing to watch out for here is that the daemon launches runs but doesn't actually perform the execution, so if the run that gets launched successfully from the daemon ends up failing during pipeline execution, you would probably need some separate monitoring for that.

(some run launchers might do the execution in a subprocess, which could result in some execution output getting logged in the daemon process - but customizing the daemon logger would not affect those logs).

NakulK48 commented 2 years ago

It looks like this is supported now: https://docs.dagster.io/concepts/logging/python-logging

Is it possible to set the timezone (e.g. to UTC), in line with https://stackoverflow.com/questions/6321160/how-to-set-timestamps-on-gmt-utc-on-python-logging ?

sryza commented 2 years ago

Our recent changes enable instance-level configuration for logs that are produced inside runs. We still don't offer a way to configure logs for code that's invoked by the daemon (schedule & sensor evaluation functions, should_execute).

sryza commented 2 years ago

@clairelin135 @OwenKephart - do you have an answer to @NakulK48 's timezone question?

clairelin135 commented 2 years ago

Yes, if you specify configuration following this example in the docs you should be able to configure time formatting like this.