dbt-labs / dbt-core

dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.
https://getdbt.com
Apache License 2.0
9.95k stars 1.63k forks source link

[CT-662] [Feature] Allow the configuration of the timezone for timestamps in event logging #5267

Open Stefan-Rann opened 2 years ago

Stefan-Rann commented 2 years ago

Is this your first time opening an issue?

Describe the Feature

Currently the timestamp in event logging is always output in utc. It would be helpful to be able to set the time zone via a configuration or parameter (either local or any other time zone).

Describe alternatives you've considered

I have not found any alternatives. One has to live with utc.

Who will this benefit?

This would make it easier to evaluate the logs and help to avoid confusion. Additionally the evaluation of dbt logs in combination with other logs using lokal or any other timezone would be easier.

Are you interested in contributing this feature?

No response

Anything else?

No response

gshank commented 2 years ago

This sounds like a reasonable request. We're going to be looking at logging updates later in this quarter. I'm not sure if it would get onto the schedule or not, but we can take a look.

github-actions[bot] commented 1 year ago

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please remove the stale label or comment on the issue, or it will be closed in 7 days.

github-actions[bot] commented 1 year ago

Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest; add a comment to notify the maintainers.

jtcohen6 commented 1 year ago

Reopening because we just got a duplicate opened in https://github.com/dbt-labs/dbt-core/issues/6319

Things I'm curious about: How would you want to configure this? Should there only be two options (local system time versus UTC)?

ahuiiiiiii commented 1 year ago

Reopening because we just got a duplicate opened in #6319

Things I'm curious about: How would you want to configure this? Should there only be two options (local system time versus UTC)?

can i add an option that set zonetime in dbt-core project configuration file?

ahuiiiiiii commented 1 year ago

sorry spelling error above should be timezone

Stefan-Rann commented 1 year ago

can i add an option that set zonetime in dbt-core project configuration file?

Yes, a project level setting would be cool to set the timezone by name. Additionally the option "local" would be great to use the system timezone without the need to explicitly define the timezone.

Gunnnn commented 1 year ago

Please! I see timestamps in log in UTC and its annoying =(

Frunkh commented 1 year ago

Atleast change record.extra[self.name] = datetime.utcnow().isoformat() to record.extra[self.name] = datetime.now().isoformat()

in logger.py So it respects the timezone of the system.. this is really annoying.

sgaraycoa-de commented 1 year ago

Adding my vote in favor of this issue

jtcohen6 commented 1 year ago

Putting aside, for the moment, the question of creating a standalone configuration for this — I'd be supportive of switching from datetime.utcnow() to datetime.now(), so that the user's local system timezone is what's displayed in the logs. My sense is that most remotely-running orchestration systems would be in UTC, so it wouldn't be a change there.

Timestamp Doug (@dbeatty10) - does that proposed switch give you any trepidation?

dbeatty10 commented 1 year ago

Main trepidation

For both datetime.utcnow() and datetime.now(), the Python datetimes are naive rather than aware [1][2] so the UTC offset is not explicit in either case.

Consequences

With the current dbt log output standardized to UTC, it's still comparable to other dbt logs (as long as you know it's UTC). But most human beings think about time in their local time zone.

When logs from dbt (or other systems) use naive timestamps that are localized, then comparing them really becomes quite the forensic exercise 😰

Possible solution

A possible solution is to use aware datetimes and print out the UTC offset. The trade-off is that this costs 6 extra characters in the log output.

$ dbt run
07:42:18-06:00  Running with dbt=1.4.4
============================== 2023-04-18 07:42:18.050150-06:00 | 07e7091e-7086-4fb5-9cab-643f3579e090 ==============================
07:42:18.050150-06:00 [info ] [MainThread]: Running with dbt=1.4.4

Python examples

from datetime import timezone, datetime as dt
Python Output UTC Offset
dt.utcnow().isoformat() 2023-04-18T13:42:18.050150 Naive N/A
dt.now().isoformat() 2023-04-18T07:42:18.050150 Naive N/A
dt.now().astimezone().isoformat() 2023-04-18T07:42:18.050150-06:00 Aware System time zone
dt.now(timezone.utc).isoformat() 2023-04-18T13:42:18.050150+00:00 Aware UTC
nevdelap commented 9 months ago

Just having it log in the system's timezone would be great. I'm lucky enough to be at UTC+10 so it is only mildly annoying, but still strange that it would be that way.

I've created this workaround, which also adds an elapsed time to the output. Someone might find it useful. 😉

dbeatty10 commented 9 months ago

I'd propose two new configurations to handle this use case:

  1. log_timestamp_tz_source: utc (default) | system
  2. log_timestamp_print_offset: false (default) | true

*I didn't put a ton of thought into the names, so they could be shorted during design refinement as-needed. For example, maybe log_timestamp_tz_offset_print would be better, etc. Or maybe these two for brevity:

### Example logic `ts.py` ```python from datetime import timezone, datetime as dt OPTIONS_LOG_TIMESTAMP_TZ_SOURCE = ["utc", "system"] OPTIONS_LOG_TIMESTAMP_PRINT_OFFSET = [False, True] def get_now_aware_utc_tz(): return dt.now(timezone.utc) def get_now_aware_system_tz(): return dt.now().astimezone() def get_now_formatted(log_timestamp_tz_source="utc", log_timestamp_print_offset=False): # default to UTC aware_timestamp = get_now_aware_utc_tz() # default to omitting the UTC offset format = "%H:%M:%S" # use the system timezone if specfied if log_timestamp_tz_source == "system": aware_timestamp = get_now_aware_system_tz() # include the UTC offset if specfied if log_timestamp_print_offset: format = "%H:%M:%S %z" return aware_timestamp.strftime(format) for log_timestamp_tz_source in OPTIONS_LOG_TIMESTAMP_TZ_SOURCE: for log_timestamp_print_offset in OPTIONS_LOG_TIMESTAMP_PRINT_OFFSET: now_formatted = get_now_formatted(log_timestamp_tz_source, log_timestamp_print_offset) print(f"\nlog_timestamp_tz_source: {log_timestamp_tz_source}; log_timestamp_print_offset: {log_timestamp_print_offset}") print(now_formatted) ``` Run it: ```shell python ts.py ``` Example output: ``` $ python ts.py log_timestamp_tz_source: utc; log_timestamp_print_offset: False 19:58:19 log_timestamp_tz_source: utc; log_timestamp_print_offset: True 19:58:19 +0000 log_timestamp_tz_source: system; log_timestamp_print_offset: False 12:58:19 log_timestamp_tz_source: system; log_timestamp_print_offset: True 12:58:19 -0700 ```

Flags / global configs

CLI flags:

environment variables:

dbt_project.yml flags:

Usage

CLI flags:

dbt compile --log-timestamp-tz-source system
dbt compile --log-timestamp-tz-source utc
dbt compile --no-log-timestamp-print-offset
dbt compile --log-timestamp-print-offset

environment variables:

export LOG_TIMESTAMP_TZ_SOURCE=system
export LOG_TIMESTAMP_TZ_SOURCE=utc
export LOG_TIMESTAMP_PRINT_OFFSET=1
export LOG_TIMESTAMP_PRINT_OFFSET=true
export LOG_TIMESTAMP_PRINT_OFFSET=T
export LOG_TIMESTAMP_PRINT_OFFSET=0
export LOG_TIMESTAMP_PRINT_OFFSET=false
export LOG_TIMESTAMP_PRINT_OFFSET=F

dbt_project.yml flags:

flags:
  log_timestamp_tz_source: system
  log_timestamp_tz_source: utc
  log_timestamp_print_offset: true
  log_timestamp_print_offset: false

Rationale

The defaults would preserve backwards-compatibility.

The first config would allow using the system timezone wherever dbt is running from (whether it is a user's machine or a container orchestrated somewhere remotely).

The second config would allow the timestamps to be interpreted in an aware fashion rather than forcing either a naive interpretation or assuming UTC.

Rejected idea

One thing I considered and decided against:

The reason is that logs display a different level of granularity when emitted to the CLI versus the file logs. The first does not include fractional seconds, whereas the second does.

Also there is a different format when --log-format json vs. not. The former always emits a ISO8601-formatted string that ends in Z whereas the latter does not.

Risk reduction

If you are certain that your orchestration environment system time zone is set to UTC (and won't ever change!) or that your log ingestion can interpret localized timestamps properly (somehow), then you can set these flags within dbt_project.yml.

But to reduce risk of a non-UTC orchestration environment messing up log ingestion and parsing by a tool like Datadog, etc., you could configure these via environment variable in your local environment rather than using flags within dbt_project.yml:

export DBT_LOG_TIMESTAMP_TZ_SOURCE=system
export LOG_TIMESTAMP_PRINT_OFFSET=0
jim256 commented 7 months ago

I'd also really like to see two options (local system time versus UTC) rather than always defaulting to UTC only.

PORXavierM commented 5 months ago

As somebody find solution for this? is a problem from python or dbt himself?

dbeatty10 commented 5 months ago

@PORXavierM the timestamps are coming from the logging of dbt.

There is some of amount of configurability of the timestamps via the LOG_FORMAT and LOG_FORMAT_FILE configs. Here's a brief summary:

Log format Time zone Aware vs. naive Precision
text UTC naive timestamp second
debug system local time zone naive timestamp microsecond
json UTC aware timestamp microsecond

There's some more nitty-gritty code details discussed here.