Closed omus closed 11 months ago
I'm out of time for today. Will be trying to flush out he proposal over the next few days if I have time
Julia serialization is stable in 1.x versions in the sense that data save by Julia 1.x will be readable by Julia 1.y for x ≤ y, so it's fine to use Julia serialization for this. If the format needs to change for whatever reason, we can put compat bounds on the package versions. With that approach it seems like it would be best to pre-process the tzdata into the right format as an automated process and then just have the package consume the pre-processed artifacts.
Another potential approach would be to download the tzdata in raw form as an artifact and then use scratch spaces for the processed version. The only issue with this is that I think it would happen at package load time since I don't think we have a hook for scratch spaces as install/build time. This also means that everyone ends up doing the same work locally over and over again, whereas the other approach does it once centrally and then everyone just loads the pre-processed timezone data.
Julia serialization is stable in 1.x versions in the sense that data save by Julia 1.x will be readable by Julia 1.y for x ≤ y, so it's fine to use Julia serialization for this.
In practice, I don't think this is true if the code in question depends on types defined in packages, because packages can change the representation of types in a way that doesn't break their public API but does break Serialization. And one often cannot help but change the package environment slightly when changing Julia versions, because e.g. Manifest's are not compatible between Julia versions, and therefore changing Julia versions often does end up breaking serialization, at least in my experience. (And that's not even mentioning anonymous functions, though those aren't relevant in this case).
Serialize the
VariableTimeZone
andFixedTimeZone
data using Julia's Seralization.jl and store the data on disk
FixedTimeZone
uses InlineString15
which depends on InlineStrings.jl. I think it's possible InlineStrings changes the representation of InlineString15
in a non-breaking way; for example, https://github.com/JuliaStrings/InlineStrings.jl/issues/6 makes such a proposal. I believe that would break Serialization-based deserialization of InlineString15
's. Therefore, I don't think TimeZones should use Serialization for this.
What about creating a simple binary serialization format just for this data then?
What about creating a simple binary serialization format just for this data then?
The plan is to use the binary tzfile format. Maybe a slightly modified version to work around some minor limitations
It looks like the data types that need to be representable are Zone
, Rule
and TZSource
are the major things that need to be representable. They all seem like they could have fairly simple representations that we could make easy to load into memory. The only really hard field is the on
field which is an "Anonymous boolean function to determine day" but I'm assuming we could eliminate that with some work or replace it with an enumeration of specific functions.
Ah, ok. There's already a format. That seems like a good approach.
Seeing CI failures due to TZdata every once in a while. It would be really nice to get this done.
Seeing CI failures due to TZdata every once in a while. It would be really nice to get this done.
I've been working on this issue during my free time as I've been unable to allocate working hours to this. The limited amount of time to work on this has made progress slow but it's still progress!
I've been working on this issue during my free time as I've been unable to allocate working hours to this.
Just to clarify to avoid any misinterpretation: this is something we're dedicating time to at Beacon (it's in the current sprint, in fact). I just didn't have the time to work on it in a previous sprint due to some competing product development priorities.
A quick status update. As of TimeZones 1.8.0 we have support for reading/writing to the new TZJFile serialization format (this is based off of the tzfile format but with some slight modifications for our particular use case). For the TimeZones 1.9.0 release we'll switch over to using the TZJFile format as the default and also switch over to using scratch spaces for the build directory. We'll still have a build step for now but this lets us thoroughly validate our new format and solve a few issues like #343.
I'm planning on releasing TimeZones 1.9.0 early next week, ideally Monday.
Thanks for doing this, @omus. I guess if TZJFile format works out in the future we could switch to pre-building the files and generating artifacts? Or does that not make sense?
I guess if TZJFile format works out in the future we could switch to pre-building the files and generating artifacts? Or does that not make sense?
That is definitely still the plan. I originally was hoping to have this for the 1.9.0 release but I have more work to do there and I'd like to avoid holding up these changes until I get everything perfect.
Did some work over the weekend to get the 1.9.0 release ready. I've just registered the release and it should be available shortly.
Just want to pop in and say I'm excited for the direction headed here! Thanks @omus!
PR https://github.com/JuliaTime/TimeZones.jl/pull/441 finally closed this by utilizing precompiled time zone artifacts provided by the TZJData.jl package. Included in TimeZones release 1.12
The Problem
The current implementation of TimeZones.jl builds tzdata into an internal representation using the following high level steps:
VariableTimeZone
andFixedTimeZone
instancesVariableTimeZone
andFixedTimeZone
data using Julia's Seralization.jl and store the data on diskThere have been a few issues with this setup such as:
deps/build.jl
triggers serial precompilation:include_lazy_artifacts
flag):Reference
Proposal
NOTE: These details need to be expanded upon and are likely to change