JuliaTime / TimeZones.jl

IANA time zone database access for the Julia programming language
Other
86 stars 51 forks source link

RFC: Separate TZData module into separate package #359

Closed omus closed 11 months ago

omus commented 2 years ago

The Problem

The current implementation of TimeZones.jl builds tzdata into an internal representation using the following high level steps:

  1. Retrieve the IANA tzdata files using the Pkg artifacts system
  2. Parse the tzdata into VariableTimeZone and FixedTimeZone instances
  3. Serialize the VariableTimeZone and FixedTimeZone data using Julia's Seralization.jl and store the data on disk

There have been a few issues with this setup such as:

Reference

Proposal

NOTE: These details need to be expanded upon and are likely to change

omus commented 2 years ago

I'm out of time for today. Will be trying to flush out he proposal over the next few days if I have time

StefanKarpinski commented 2 years ago

Julia serialization is stable in 1.x versions in the sense that data save by Julia 1.x will be readable by Julia 1.y for x ≤ y, so it's fine to use Julia serialization for this. If the format needs to change for whatever reason, we can put compat bounds on the package versions. With that approach it seems like it would be best to pre-process the tzdata into the right format as an automated process and then just have the package consume the pre-processed artifacts.

Another potential approach would be to download the tzdata in raw form as an artifact and then use scratch spaces for the processed version. The only issue with this is that I think it would happen at package load time since I don't think we have a hook for scratch spaces as install/build time. This also means that everyone ends up doing the same work locally over and over again, whereas the other approach does it once centrally and then everyone just loads the pre-processed timezone data.

ericphanson commented 2 years ago

Julia serialization is stable in 1.x versions in the sense that data save by Julia 1.x will be readable by Julia 1.y for x ≤ y, so it's fine to use Julia serialization for this.

In practice, I don't think this is true if the code in question depends on types defined in packages, because packages can change the representation of types in a way that doesn't break their public API but does break Serialization. And one often cannot help but change the package environment slightly when changing Julia versions, because e.g. Manifest's are not compatible between Julia versions, and therefore changing Julia versions often does end up breaking serialization, at least in my experience. (And that's not even mentioning anonymous functions, though those aren't relevant in this case).

Serialize the VariableTimeZone and FixedTimeZone data using Julia's Seralization.jl and store the data on disk

FixedTimeZone uses InlineString15 which depends on InlineStrings.jl. I think it's possible InlineStrings changes the representation of InlineString15 in a non-breaking way; for example, https://github.com/JuliaStrings/InlineStrings.jl/issues/6 makes such a proposal. I believe that would break Serialization-based deserialization of InlineString15's. Therefore, I don't think TimeZones should use Serialization for this.

StefanKarpinski commented 2 years ago

What about creating a simple binary serialization format just for this data then?

omus commented 2 years ago

What about creating a simple binary serialization format just for this data then?

The plan is to use the binary tzfile format. Maybe a slightly modified version to work around some minor limitations

StefanKarpinski commented 2 years ago

It looks like the data types that need to be representable are Zone, Rule and TZSource are the major things that need to be representable. They all seem like they could have fairly simple representations that we could make easy to load into memory. The only really hard field is the on field which is an "Anonymous boolean function to determine day" but I'm assuming we could eliminate that with some work or replace it with an enumeration of specific functions.

StefanKarpinski commented 2 years ago

Ah, ok. There's already a format. That seems like a good approach.

ViralBShah commented 2 years ago

Seeing CI failures due to TZdata every once in a while. It would be really nice to get this done.

omus commented 2 years ago

Seeing CI failures due to TZdata every once in a while. It would be really nice to get this done.

I've been working on this issue during my free time as I've been unable to allocate working hours to this. The limited amount of time to work on this has made progress slow but it's still progress!

omus commented 2 years ago

I've been working on this issue during my free time as I've been unable to allocate working hours to this.

Just to clarify to avoid any misinterpretation: this is something we're dedicating time to at Beacon (it's in the current sprint, in fact). I just didn't have the time to work on it in a previous sprint due to some competing product development priorities.

omus commented 2 years ago

A quick status update. As of TimeZones 1.8.0 we have support for reading/writing to the new TZJFile serialization format (this is based off of the tzfile format but with some slight modifications for our particular use case). For the TimeZones 1.9.0 release we'll switch over to using the TZJFile format as the default and also switch over to using scratch spaces for the build directory. We'll still have a build step for now but this lets us thoroughly validate our new format and solve a few issues like #343.

omus commented 2 years ago

I'm planning on releasing TimeZones 1.9.0 early next week, ideally Monday.

StefanKarpinski commented 2 years ago

Thanks for doing this, @omus. I guess if TZJFile format works out in the future we could switch to pre-building the files and generating artifacts? Or does that not make sense?

omus commented 2 years ago

I guess if TZJFile format works out in the future we could switch to pre-building the files and generating artifacts? Or does that not make sense?

That is definitely still the plan. I originally was hoping to have this for the 1.9.0 release but I have more work to do there and I'd like to avoid holding up these changes until I get everything perfect.

omus commented 2 years ago

Did some work over the weekend to get the 1.9.0 release ready. I've just registered the release and it should be available shortly.

quinnj commented 2 years ago

Just want to pop in and say I'm excited for the direction headed here! Thanks @omus!

omus commented 11 months ago

PR https://github.com/JuliaTime/TimeZones.jl/pull/441 finally closed this by utilizing precompiled time zone artifacts provided by the TZJData.jl package. Included in TimeZones release 1.12