Open Stefan-Rann opened 2 years ago
This sounds like a reasonable request. We're going to be looking at logging updates later in this quarter. I'm not sure if it would get onto the schedule or not, but we can take a look.
This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please remove the stale label or comment on the issue, or it will be closed in 7 days.
Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest; add a comment to notify the maintainers.
Reopening because we just got a duplicate opened in https://github.com/dbt-labs/dbt-core/issues/6319
Things I'm curious about: How would you want to configure this? Should there only be two options (local system time versus UTC)?
Reopening because we just got a duplicate opened in #6319
Things I'm curious about: How would you want to configure this? Should there only be two options (local system time versus UTC)?
can i add an option that set zonetime in dbt-core project configuration file?
sorry spelling error above should be timezone
can i add an option that set zonetime in dbt-core project configuration file?
Yes, a project level setting would be cool to set the timezone by name. Additionally the option "local" would be great to use the system timezone without the need to explicitly define the timezone.
Please! I see timestamps in log in UTC and its annoying =(
Atleast change
record.extra[self.name] = datetime.utcnow().isoformat()
to
record.extra[self.name] = datetime.now().isoformat()
in logger.py So it respects the timezone of the system.. this is really annoying.
Adding my vote in favor of this issue
Putting aside, for the moment, the question of creating a standalone configuration for this — I'd be supportive of switching from datetime.utcnow()
to datetime.now()
, so that the user's local system timezone is what's displayed in the logs. My sense is that most remotely-running orchestration systems would be in UTC, so it wouldn't be a change there.
Timestamp Doug (@dbeatty10) - does that proposed switch give you any trepidation?
For both datetime.utcnow()
and datetime.now()
, the Python datetime
s are naive rather than aware [1][2] so the UTC offset is not explicit in either case.
With the current dbt log output standardized to UTC, it's still comparable to other dbt logs (as long as you know it's UTC). But most human beings think about time in their local time zone.
When logs from dbt (or other systems) use naive timestamps that are localized, then comparing them really becomes quite the forensic exercise 😰
A possible solution is to use aware datetime
s and print out the UTC offset. The trade-off is that this costs 6 extra characters in the log output.
$ dbt run
07:42:18-06:00 Running with dbt=1.4.4
============================== 2023-04-18 07:42:18.050150-06:00 | 07e7091e-7086-4fb5-9cab-643f3579e090 ==============================
07:42:18.050150-06:00 [info ] [MainThread]: Running with dbt=1.4.4
from datetime import timezone, datetime as dt
Python | Output | UTC Offset | |
---|---|---|---|
dt.utcnow().isoformat() |
2023-04-18T13:42:18.050150 |
Naive | N/A |
dt.now().isoformat() |
2023-04-18T07:42:18.050150 |
Naive | N/A |
dt.now().astimezone().isoformat() |
2023-04-18T07:42:18.050150-06:00 |
Aware | System time zone |
dt.now(timezone.utc).isoformat() |
2023-04-18T13:42:18.050150+00:00 |
Aware | UTC |
Just having it log in the system's timezone would be great. I'm lucky enough to be at UTC+10 so it is only mildly annoying, but still strange that it would be that way.
I've created this workaround, which also adds an elapsed time to the output. Someone might find it useful. 😉
I'd propose two new configurations to handle this use case:
log_timestamp_tz_source
: utc (default) | systemlog_timestamp_print_offset
: false (default) | true*I didn't put a ton of thought into the names, so they could be shorted during design refinement as-needed. For example, maybe log_timestamp_tz_offset_print
would be better, etc. Or maybe these two for brevity:
log_tz_source
log_tz_offset_print
CLI flags:
--log-timestamp-tz-source
--no-log-timestamp-print-offset
/ --log-timestamp-print-offset
environment variables:
LOG_TIMESTAMP_TZ_SOURCE
LOG_TIMESTAMP_PRINT_OFFSET
dbt_project.yml
flags:
log_timestamp_tz_source
log_timestamp_print_offset
CLI flags:
dbt compile --log-timestamp-tz-source system
dbt compile --log-timestamp-tz-source utc
dbt compile --no-log-timestamp-print-offset
dbt compile --log-timestamp-print-offset
environment variables:
export LOG_TIMESTAMP_TZ_SOURCE=system
export LOG_TIMESTAMP_TZ_SOURCE=utc
export LOG_TIMESTAMP_PRINT_OFFSET=1
export LOG_TIMESTAMP_PRINT_OFFSET=true
export LOG_TIMESTAMP_PRINT_OFFSET=T
export LOG_TIMESTAMP_PRINT_OFFSET=0
export LOG_TIMESTAMP_PRINT_OFFSET=false
export LOG_TIMESTAMP_PRINT_OFFSET=F
dbt_project.yml
flags:
flags:
log_timestamp_tz_source: system
log_timestamp_tz_source: utc
log_timestamp_print_offset: true
log_timestamp_print_offset: false
The defaults would preserve backwards-compatibility.
The first config would allow using the system timezone wherever dbt is running from (whether it is a user's machine or a container orchestrated somewhere remotely).
The second config would allow the timestamps to be interpreted in an aware fashion rather than forcing either a naive interpretation or assuming UTC.
One thing I considered and decided against:
log_timestamp_format
config that would accept a format that can be passed to the datetime.strftime
method.The reason is that logs display a different level of granularity when emitted to the CLI versus the file logs. The first does not include fractional seconds, whereas the second does.
Also there is a different format when --log-format json
vs. not. The former always emits a ISO8601-formatted string that ends in Z
whereas the latter does not.
If you are certain that your orchestration environment system time zone is set to UTC (and won't ever change!) or that your log ingestion can interpret localized timestamps properly (somehow), then you can set these flags within dbt_project.yml
.
But to reduce risk of a non-UTC orchestration environment messing up log ingestion and parsing by a tool like Datadog, etc., you could configure these via environment variable in your local environment rather than using flags within dbt_project.yml
:
export DBT_LOG_TIMESTAMP_TZ_SOURCE=system
export LOG_TIMESTAMP_PRINT_OFFSET=0
I'd also really like to see two options (local system time versus UTC) rather than always defaulting to UTC only.
As somebody find solution for this? is a problem from python or dbt himself?
@PORXavierM the timestamps are coming from the logging of dbt
.
There is some of amount of configurability of the timestamps via the LOG_FORMAT
and LOG_FORMAT_FILE
configs. Here's a brief summary:
Log format | Time zone | Aware vs. naive | Precision |
---|---|---|---|
text |
UTC | naive timestamp | second |
debug |
system local time zone | naive timestamp | microsecond |
json |
UTC | aware timestamp | microsecond |
There's some more nitty-gritty code details discussed here.
Is this your first time opening an issue?
Describe the Feature
Currently the timestamp in event logging is always output in utc. It would be helpful to be able to set the time zone via a configuration or parameter (either local or any other time zone).
Describe alternatives you've considered
I have not found any alternatives. One has to live with utc.
Who will this benefit?
This would make it easier to evaluate the logs and help to avoid confusion. Additionally the evaluation of dbt logs in combination with other logs using lokal or any other timezone would be easier.
Are you interested in contributing this feature?
No response
Anything else?
No response