ariebovenberg / whenever

⏰ Modern datetime library for Python, available in Rust or pure Python
https://whenever.rtfd.io
MIT License
831 stars 13 forks source link

use tzinfo instead of str to identify the TimeZone in ZonedDateTime #65

Open bxparks opened 7 months ago

bxparks commented 7 months ago

The ZonedDateTime class uses a str to identify the timezone. Please consider using the standard tzinfo class instead. Using tzinfo allows alternative timezone libraries to be used instead of the one baked into whatever (presumably zoneinfo).

I have my own timezone Python library. It provides 2 different implementations of tzinfo. Both are drop-in replacements for zoneinfo, pytz, or any other Python timezone libraries. Given the current implementation of ZonedDateTime, I cannot use my own timezone library with whatever.

Why do I want to use my own? Because zoneinfo has bugs in obscure edge cases. (pytz has even more bugs, as well as the year 2038 problem). My libraries don't.

ariebovenberg commented 7 months ago

Interesting! You'll agree though that the 'customized tzinfo class' use case is niche. My general approach to these kind of issues are to allow them if the API doesn't become too complex for the 'average' user.

What immediately comes to mind is to allow tz= argument of ZonedDateTime to be str | tzinfo. However, this will affect the API:

Thes are pretty big downsides IMHO, as established identifiers are key for interoperability.

Another potential solution is to provide a CustomDateTime class (perhaps in a separate whenever.advanced package), which has a tzinfo= argument. This class is an AwareDateTime subclass but then doesn't have a canonical string representation.

Because zoneinfo has bugs in obscure edge cases.

Pardon my ignorance, but: shouldn't these be fixed there then? Are these bugs with the Python library, or the actual IANA tz db?

Alternatively, isn't it possible to modify the zoneinfo files on your system yourself if you know better? Then the ZoneInfo package would automatically adjust to your needs.

bxparks commented 7 months ago

I'll address your questions about the zoneinfo library in a separate post. It's complicated.

I think the least desirable solution is a CustomDateTime. Your library already has too many subclasses of AwareDateTime. :-)

I agree that using a non-IANA timezone identifier may be niche, but I think the need or desire to use a different timezone library may be more common than you might think:

1) Backwards compatibility An app may want to upgrade from datetime to whenever, but keep using pytz, for backwards compatibility with previous versions of the app. It may want to migrate to zoneinfo eventually, but do the migration in phases, first datetime->whenever then to pytz->zoneinfo. 2) Cross compatibility A system may be composed of multiple Python apps, servlets, modules, etc. One part of the system may be on pytz and need to remain there for some time. A different part of the system may need to continue to use pytz for cross compatibility. 3) Deterministic Reproducibility An app may need to pin a specific version of zoneinfo or pytz for integration testing and validation. 4) Better timezone library A new timezone library that is faster, smaller, and more accurate than zoneinfo may be created in the future. 5) zoneinfo is not available In resource-constrained, bare-metal (no OS) embedded environments, the zoneinfo library may be too large and not available. Smaller timezone libraries may need to be used. (In theory, my timezone libraries should fit into something on the order of 32-96 kB, instead of the multi-megabytes required by zoneinfo).

Proposed Solution

Here is my solution that I think is compatible with your current constraints on ZonedDateTime. My solution would continue use tz: str parameter in ZonedDateTime. But instead of hardcoding the dependency to zoneinfo, we add a layer of indirection (... "all CS problems can be solved with a layer of indirection"...), whose purpose is to provide an instance of tzinfo from its str:

(Sorry for any dumb syntax errors below, I haven't done much Python coding in several years...)

import zoneinfo
from typing_extensions import Protocol
[...]

class TzProvider(Protocol):
    def gettz(name: str) -> tzinfo:
         ...

class ZoneInfoProvider(TzProvider):
    def gettz(name: str) -> tzinfo:
        return zoneinfo.ZoneInfo(name)

Then at the whenever module level, we can provide hooks to replace the default TzProvider:


_tz_provider = ZoneInfoProvider()

def set_tz_provider(provider: TzProvider) -> None:
    _tz_provider = provider

def get_tz_provider() -> TzProvider:
    """Probably useful only for testing purposes"""
    return _tz_provider

Interesting Consequences

I realize that a pluggable TzProvider infrastructure is getting into Enterprise Java territory, but sometimes, Enterprise Java uses certain techniques because it actually solves real problems.

A pluggable TzProvider achieves the decoupling of the DateTime related classes and the TimeZone related classes, as I mentioned in #30, when I was speculating that the reason that LocalDateTime/SystemDateTime was needed was because there is too much coupling between "date/time" concepts and "timezone" concepts.

One interesting consequence is that this solution is that it allows whenever to support POSIX-style timezone rules naturally:

from whenever import ZonedDateTime
from whenever import TzProvider
from whenever import set_tz_provider

class PosixProvider(TzProvider):
    def gettz(name: str):
        # ...

posix_provider = PosixProvider()
set_tz_provider(posix_provider)

tz = ZonedDateTime(tz="NZST-12NZDT,M9.5.0,M4.1.0/3")

Implications for Pickling

I don't actually understand the requirements of Python pickling, so I don't know what is required of the ZonedDateTime class to support this pickling thing. Seems to me that it would be up to the specific TzProvider instance to decide whether it can handle a special, non-IANA, name identifier.

ariebovenberg commented 7 months ago

Lots to think about.

My impression is that most issues are solved by:

zoneinfo as a tz provider

As far as I can see, zoneinfo already supports almost everything you'd need from a tz provider:

What is zoneinfo itself it too large for your system?

I'm not familiar with using Python in super-constrained environments, but the zoneinfo library itself is only about 1k lines. And there's also a compiled C version that is probably smaller.

I'd feel safe saying that if 1k lines of code make a difference, then users are probably better off sticking to the standard library.

What about other timezone libraries?

I'm not yet convinced of the need to support alternate libraries. The IANA tz db (accessible through zoneinfo) is the overwhelming worldwide standard. AFAIK there is currently no reason to choose an alternative: