JuliaTime / TimeZones.jl

IANA time zone database access for the Julia programming language
Other
86 stars 51 forks source link

Use `Scratch` to hold downloads and compiled tzdata #384

Closed staticfloat closed 2 years ago

staticfloat commented 2 years ago

This should address many of the problems listed in #343. It still contains a Pkg.build() step to do the serialization, but it stores things in a scratch space, rather than in the package directory, and it marks the default tzdata as an eager artifact, rather than a lazy one, so it should download along with the rest of the artifacts during package installation.

The one thing I'm not sure about is whether the "version" that it checks against the IANA servers is properly cached; I see many different ways that the version can be queried, and I'm not sure if the "happy path" still makes a request against the IANA servers. I would prefer that it doesn't, and no network accesses outside of the normal Pkg flow occur, as that helps this package be used in restrictive environments where only an internal PkgServer is whitelisted.

codecov-commenter commented 2 years ago

Codecov Report

Merging #384 (9c49fba) into master (4cae0ca) will increase coverage by 0.12%. The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master     #384      +/-   ##
==========================================
+ Coverage   95.12%   95.24%   +0.12%     
==========================================
  Files          35       36       +1     
  Lines        1743     1746       +3     
==========================================
+ Hits         1658     1663       +5     
+ Misses         85       83       -2     
Impacted Files Coverage Δ
src/TimeZones.jl 100.00% <100.00%> (ø)
src/discovery.jl 86.51% <100.00%> (ø)
src/types/timezone.jl 97.22% <100.00%> (+0.07%) :arrow_up:
src/tzdata/TZData.jl 100.00% <100.00%> (ø)
src/tzdata/build.jl 84.61% <100.00%> (+7.69%) :arrow_up:
src/tzdata/compile.jl 95.42% <100.00%> (-0.02%) :arrow_down:
src/tzdata/download.jl 91.17% <100.00%> (+0.26%) :arrow_up:
src/winzone/WindowsTimeZoneIDs.jl 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 4cae0ca...9c49fba. Read the comment docs.

staticfloat commented 2 years ago

As a side note, I did experiment with eliminating the local serialization step completely; I used JLD2 as a substitute, which seems to work, although it does add a few KB to each serialized file.

We could have a GitHub Actions job that automatically serializes all tzdata sets and uploads them to a GitHub release or similar, and opens a pull request if a new tzdata set is added without the paired serialized artifact. If someone wants to do this, I'd be happy to guide them through it, but I can't do it myself at the moment.

omus commented 2 years ago

Thanks for this @staticfloat. I'll give this a thorough review tomorrow but I'm already sure I'd like to pair this with defaulting to the TZJFile format introduced in #380.

omus commented 2 years ago

I should call out that this could be a breaking change if people were referencing the now removed constants. I'm not too worried about this but I may use some globals defined at init-time to ensure this is non-breaking.

omus commented 2 years ago

❗ We'll need to use different directories for each tzdata version. Otherwise having multiple versions of TimeZones installed at the same time with different tzdata versions will interfere with each other – https://github.com/JuliaTime/TimeZones.jl/pull/384#discussion_r916476283

With this in mind I think it makes sense to merge #385 first as that will eliminate us having to specify the Julia version number to ensure serialization compatibility.

omus commented 2 years ago

I'll mention at first I was thinking using Ref constants for the directories was the correct approach but the use of functions reduces the chance of mutations there so I think it's the right direction

After sleeping on this there may be an advantage in the Ref approach under certain conditions. For example if we continue to allow specifying the tzdata version used via the env variable JULIA_TZ_VERSION we may want to reference the path to the tzdata version loaded into memory currently instead just reflecting the current JULIA_TZ_VERSION value. Using a Ref would allow us to update this directory under specific conditions only.

omus commented 2 years ago

Alright in order to get things moving I'll merge this as it exists now and follow up on the issues I pointed out in a follow up PR.

omus commented 2 years ago

The changes in this PR remove the constants:

TimeZones.PKG_DIR
TimeZones.DEPS_DIR
TimeZones.TZData.TZ_SOURCE_DIR
TimeZones.TZData.COMPILED_DIR
TimeZones.TZData.LATEST
TimeZones.TZData.LATEST_FILE_PATH
TimeZones.WindowsTimeZoneIDs.WINDOWS_XML_DIR
TimeZones.WindowsTimeZoneIDs.WINDOWS_XML_FILE

All of these should only be used internally by TimeZones.jl so I won't bother with trying to gracefully deprecate external usages. In the future I'll try to be more diligent in this package to mark internal contants and functions with leading underscores.

omus commented 2 years ago

This PR is also running into #386. Looking into this before merging here

omus commented 2 years ago

This PR is also running into #386. Looking into this before merging here

Failure definitely appears unrelated.