Open uranusjr opened 2 years ago
:+1:, but it's arguably a DAG breaking change to replace pendulum, since all the dates exposed via the (template) context are Pendulum instances, and I know I've used pendulum specific methods on those data objects in python operators
Yet another reason to start thinking about Airlfow 3 ? (just saying).
Yeah I don’t think replacing Pendulum is viable now (it would be a big undertaking), the most reasonable approach for the moment would be to simp[ly accept ZoneInfo
as input and coerce it into a Pendulum Timezone object.
first step to phase out Pendulum from Airflow
Is there any reason for this?
phase out Pendulum from Airflow
As a user of Airflow I would like to point out two use cases that will make transitioning off Pendulum difficult for users:
As an example, this is a macro I commonly write (usually as part of a filepath construction or something):
{{ data_interval_end.in_tz(dag.timezone).format('YYYY_MM_DD') }}
The formatting is easy to migrate (although not as nice) but I don't think there's anything in regular Python that makes it easy to switch the time zone of datetime correctly.
astimezone
and stftime
cover those.
astimezone
I foresee three main hurdles with converting from in_tz
to astimezone
:
astimezone
only takes a tzinfo
subclass, whereas in_tz
will coerces a string, which leads to my next most common use case:{{ data_interval_end.in_tz('America/New_York').format('YYYY_MM_DD') }}
Useful timezones like "America/New_York" are not supported in the standard library until Python 3.9 with zoneinfo
I have a feeling that for such a large application and datetime sensitive application like Airflow it's going to discover lots or weird edge cases (e.g. https://news.ycombinator.com/item?id=35916074) in the standard library as it's so relatively recent to the standard library and the larger Python ecosystem has historically relied on third party libraries, but hopefully the test suite is ready for it
stftime
There are definitely fractional second formatting using format
that cannot be expressed using purely stftime
alone: https://pendulum.eustace.io/docs/#tokens
I'm not saying migrating moving off Pendulum will be impossible, just raising that for at least some users it will not be pain free.
From the recent Pendulum PRs (#34744 and #34683) I’m starting to really want to remove our reliance on Pendulum-specific timezones in Airflow core. Pendulum will still be used as the implementation, but we switch to only use standard tzinfo interface instead. Want to get a feel if this is viable and/or worthwhile. Any thoughts? @Taragolis @bolkedebruin
I think Pendulum 3 is switching to using the default provided timezones, system and then pytz in that order, anyway. Note: there is no such thing as the standard tzinfo interface. Pendulum 2 by default also used pytz timezones through a package , which just wasn't updated frequently and can use the system provided ones.
I'd prefer keep using Pendulum's interface as that at least is consistent within (-: major releases.
there is no such thing as the standard tzinfo interface
There is a tzinfo (and timezone) interface, and it is in the stdlib. The problem is pendulum and pytz does not follow that.
yes it is in stdlib, but it does not have a standard interface in the way of 'what you see is what you get'. tzname() is a prime example. The documentation reads that it will return a timezone name, but PST is not a timezone name. So the problem is that the stdlib does not stipulate sufficiently what needs to be done.
Body
Python 3.10 introduced
zoneinfo
which covers usages ofpendulum.tz
(for Airflow uses, from what I can tell). Airflow should supportzoneinfo.ZoneInfo
for all places where it currently usespendulum.tz.Timezone
. This should be a good first step to phase out Pendulum from Airflow.The module is also backported as
backports.zoneinfo
to Python 3.6+, but it's probably not a priority to replace Pendulum usages with it for now.Committer