apache / airflow

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
https://airflow.apache.org/
Apache License 2.0
36.3k stars 14.09k forks source link

Support stdlib zoneinfo.ZoneInfo (new in Python 3.10) #19450

Open uranusjr opened 2 years ago

uranusjr commented 2 years ago

Body

Python 3.10 introduced zoneinfo which covers usages of pendulum.tz (for Airflow uses, from what I can tell). Airflow should support zoneinfo.ZoneInfo for all places where it currently uses pendulum.tz.Timezone. This should be a good first step to phase out Pendulum from Airflow.

The module is also backported as backports.zoneinfo to Python 3.6+, but it's probably not a priority to replace Pendulum usages with it for now.

Committer

ashb commented 1 year ago

:+1:, but it's arguably a DAG breaking change to replace pendulum, since all the dates exposed via the (template) context are Pendulum instances, and I know I've used pendulum specific methods on those data objects in python operators

potiuk commented 1 year ago

Yet another reason to start thinking about Airlfow 3 ? (just saying).

uranusjr commented 1 year ago

Yeah I don’t think replacing Pendulum is viable now (it would be a big undertaking), the most reasonable approach for the moment would be to simp[ly accept ZoneInfo as input and coerce it into a Pendulum Timezone object.

eshkinkot commented 1 year ago

first step to phase out Pendulum from Airflow

Is there any reason for this?

notatallshaw-gts commented 1 year ago

phase out Pendulum from Airflow

As a user of Airflow I would like to point out two use cases that will make transitioning off Pendulum difficult for users:

  1. Converting the time zone of a datetime macro
  2. Using the nice formatting Pendulum gives

As an example, this is a macro I commonly write (usually as part of a filepath construction or something):

{{ data_interval_end.in_tz(dag.timezone).format('YYYY_MM_DD') }}

The formatting is easy to migrate (although not as nice) but I don't think there's anything in regular Python that makes it easy to switch the time zone of datetime correctly.

uranusjr commented 1 year ago

astimezone and stftime cover those.

notatallshaw-gts commented 1 year ago

astimezone

I foresee three main hurdles with converting from in_tz to astimezone:

  1. astimezone only takes a tzinfo subclass, whereas in_tz will coerces a string, which leads to my next most common use case:
{{ data_interval_end.in_tz('America/New_York').format('YYYY_MM_DD') }}
  1. Useful timezones like "America/New_York" are not supported in the standard library until Python 3.9 with zoneinfo

  2. I have a feeling that for such a large application and datetime sensitive application like Airflow it's going to discover lots or weird edge cases (e.g. https://news.ycombinator.com/item?id=35916074) in the standard library as it's so relatively recent to the standard library and the larger Python ecosystem has historically relied on third party libraries, but hopefully the test suite is ready for it

stftime

There are definitely fractional second formatting using format that cannot be expressed using purely stftime alone: https://pendulum.eustace.io/docs/#tokens

I'm not saying migrating moving off Pendulum will be impossible, just raising that for at least some users it will not be pain free.

uranusjr commented 11 months ago

From the recent Pendulum PRs (#34744 and #34683) I’m starting to really want to remove our reliance on Pendulum-specific timezones in Airflow core. Pendulum will still be used as the implementation, but we switch to only use standard tzinfo interface instead. Want to get a feel if this is viable and/or worthwhile. Any thoughts? @Taragolis @bolkedebruin

bolkedebruin commented 11 months ago

I think Pendulum 3 is switching to using the default provided timezones, system and then pytz in that order, anyway. Note: there is no such thing as the standard tzinfo interface. Pendulum 2 by default also used pytz timezones through a package , which just wasn't updated frequently and can use the system provided ones.

I'd prefer keep using Pendulum's interface as that at least is consistent within (-: major releases.

uranusjr commented 11 months ago

there is no such thing as the standard tzinfo interface

There is a tzinfo (and timezone) interface, and it is in the stdlib. The problem is pendulum and pytz does not follow that.

bolkedebruin commented 11 months ago

yes it is in stdlib, but it does not have a standard interface in the way of 'what you see is what you get'. tzname() is a prime example. The documentation reads that it will return a timezone name, but PST is not a timezone name. So the problem is that the stdlib does not stipulate sufficiently what needs to be done.