Closed omus closed 2 years ago
Merging #379 (c4f36c8) into master (5e08518) will increase coverage by
0.39%
. The diff coverage is98.42%
.:exclamation: Current head c4f36c8 differs from pull request most recent head 46a781a. Consider uploading reports for the commit 46a781a to get more accurate results
@@ Coverage Diff @@
## master #379 +/- ##
==========================================
+ Coverage 94.09% 94.48% +0.39%
==========================================
Files 30 32 +2
Lines 1540 1649 +109
==========================================
+ Hits 1449 1558 +109
Misses 91 91
Impacted Files | Coverage Δ | |
---|---|---|
src/TimeZones.jl | 100.00% <ø> (ø) |
|
src/tzfile/read.jl | 96.51% <96.51%> (ø) |
|
src/local.jl | 88.88% <100.00%> (ø) |
|
src/tzfile/utils.jl | 100.00% <100.00%> (ø) |
|
src/tzfile/write.jl | 100.00% <100.00%> (ø) |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update 5e08518...46a781a. Read the comment docs.
Part of the work here was also to determine if the tzfile
format was feasible for storing our internal time zone representation. I'll call out some of my analysis was incorrect but I'll post it here anyway as this was the journey I took:
The tzfile format is for the most part compatible with the internal representation we use for VariableTimeZone
. One significant deviation is that the ttinfo
struct uses a one-byte boolean (tt_isdst
) to represent if a time zone transition is or is not DST. The TimeZones.jl package however allows the UTC offset and the DST offset to be specified independently allowing us greater flexibility in time zones we can support. When parsing the tzfile
format as a VariableTimeZone
we assume that each transition only changes the UTC offset or the DST offset but not both.
Let's validate that there exist time zones which change both of these offsets at once:
julia> using TimeZones
julia> tzif_incompatible(::FixedTimeZone) = false
tzif_incompatible (generic function with 1 method)
julia> tzif_incompatible(tz::VariableTimeZone) = tzif_incompatible_index(tz) !== nothing
tzif_incompatible (generic function with 2 methods)
julia> function tzif_incompatible_index(tz::VariableTimeZone)
t, remaining = Iterators.peel(tz.transitions)
std = t.zone.offset.std
dst = t.zone.offset.dst
for (i, t) in enumerate(remaining)
if std != t.zone.offset.std && dst != t.zone.offset.dst
return i
elseif std != t.zone.offset.std
std = t.zone.offset.std
else
dst = t.zone.offset.dst
end
end
return nothing
end
tzif_incompatible_index (generic function with 1 method)
julia> filter(tzif_incompatible, all_timezones())
154-element Vector{TimeZone}:
Africa/Algiers (UTC+1)
Africa/Casablanca (UTC+0/UTC+1)
Africa/El_Aaiun (UTC+0/UTC+1)
Africa/Tripoli (UTC+2)
⋮
US/Alaska (UTC-9/UTC-8)
US/Aleutian (UTC-10/UTC-9)
US/Indiana-Starke (UTC-6/UTC-5)
W-SU (UTC+3)
julia> tz = first(ans)
Africa/Algiers (UTC+1)
julia> i = tzif_incompatible_index(tz)
28
julia> tz.transitions[i:i + 1]
2-element Vector{TimeZones.Transition}:
1977-05-06T00:00:00 UTC+0/+1 (WEST)
1977-10-20T23:00:00 UTC+1/+0 (CET)
Since there exists transitions where both of these offsets change at once we'll need to need to represent the DST offset as a value instead of a boolean to ensure a lossless conversion.
As the tt_isdst
is stored as a one-byte value to store a boolean there are bits we can use to encode the DST offset value. As long as any reader of the tzfile format treats zero as not-DST and any non-zero value as DST it shouldn't matter what non-zero value is stored in this field. As we store seconds internally for the DST offset to match the UTC offset it would make sense to store the DST offset this way. However with one-byte we could only store up to 255 seconds which is far short of the typical 1-hour DST offset. Let's investigate what restrictions we need to adhere to:
julia> dst_negative(::FixedTimeZone) = false
dst_negative (generic function with 1 method)
julia> function dst_negative(tz::VariableTimeZone)
for t in tz.transitions
if t.zone.offset.dst < Second(0)
return true
end
end
return false
end
dst_negative (generic function with 2 methods)
julia> filter(dst_negative, all_timezones()) # Validate that negative DST offsets exist
7-element Vector{TimeZone}:
Africa/Casablanca (UTC+0/UTC+1)
Africa/El_Aaiun (UTC+0/UTC+1)
Africa/Windhoek (UTC+2)
Eire (UTC+0/UTC+1)
Europe/Bratislava (UTC+1/UTC+2)
Europe/Dublin (UTC+0/UTC+1)
Europe/Prague (UTC+1/UTC+2)
julia> dst_offsets(tz::FixedTimeZone) = Set{Second}([tz.offset.dst])
dst_offsets (generic function with 2 methods)
julia> function dst_offsets(tz::VariableTimeZone)
results = Set{Second}()
for t in tz.transitions
if t.zone.offset.dst in results
@show tz t.zone.offset.dst
end
push!(results, t.zone.offset.dst)
end
return results
end
dst_offsets (generic function with 2 methods)
julia> offsets = mapreduce(dst_offsets, union, all_timezones())
Set{Second} with 7 elements:
Second(-3600)
Second(1200)
Second(5400)
Second(1800)
Second(3600)
Second(7200)
Second(0)
julia> sort!(Minute.(offsets)) # All unique DST offsets that exist in tzdata
7-element Vector{Minute}:
-60 minutes
0 minutes
20 minutes
30 minutes
60 minutes
90 minutes
120 minutes
Sticking to the one-byte value we could use 1-minute steps to represent the DST offset which would allow us to store offsets ±2 hours or with 10-minute steps ±42 hours.
After thinking about compatibility more the tzfile format stores both the UTC and DST offset as a single value and uses the tt_isdst
boolean just to denote if this transition is considered DST. Storing the DST offset as I intended would result in typical tzfile consumers failing to notice any offset change at all as the tt_utoff
field would remain the same. Maintaining compatibility would mean we'd have to perform the following operations to convert the tzfile format into the internal representation:
dst_offset = tt_isdst
utc_offset = tt_utoff - dst_offset
Encoding the DST offset this way should allow us to extract the UTC/DST offsets separately and still maintain backwards compatibility. However, I'd need to dig into this further as we may need to have some sort of bit flag in the tzfile which lets us determine if we are interpreting the file in this new way or using our existing heuristic which attempts to extract the DST offset from past information and the tt_isdst
field (it's possible that these don't actually interfere with each other but they could).
As the tzfile also always includes the v1 in additional to the new version we can both reduce size and complexity by using a similar but custom file format. This also has the advantage letting us encode additional information such as the VariableTimeZone
cutoff and cut out the currently unused leap-second information.
Since the tzdata will be stored in a separate Julia repo TimeZones.jl depends on we can make use of the typical semver usage to ensure format compatibility.
After getting TZFile.write
functional I re-did some of this analysis and discovered that the heuristic was slightly better than I believed it to be in handling UTC/DST offset changes that occurred at the same time. As it turns out the real killer is sequential transitions where the UTC/DST offsets change and the previous DST offset and current DST offset are non-zero:
julia> using TimeZones
julia> tzif_incompatible(::FixedTimeZone) = false
tzif_incompatible (generic function with 1 method)
julia> tzif_incompatible(tz::VariableTimeZone) = tzif_incompatible_index(tz) !== nothing
tzif_incompatible (generic function with 2 methods)
julia> function tzif_incompatible_index(tz::VariableTimeZone)
t, remaining = Iterators.peel(tz.transitions)
prev_std = t.zone.offset.std
prev_dst = t.zone.offset.dst
for (i, t) in enumerate(remaining)
if prev_std != t.zone.offset.std && prev_dst != t.zone.offset.dst && !iszero(prev_dst) && !iszero(t.zone.offset.dst)
return i + 1
end
prev_std = t.zone.offset.std
prev_dst = t.zone.offset.dst
end
return nothing
end
tzif_incompatible_index (generic function with 1 method)
julia> filter(tzif_incompatible, all_timezones())
3-element Vector{TimeZone}:
Europe/Moscow (UTC+3)
Europe/Paris (UTC+1/UTC+2)
W-SU (UTC+3)
julia> i = tzif_incompatible_index(tz"Europe/Moscow")
9
julia> tz"Europe/Moscow".transitions[(i - 1):(i + 1)]
3-element Vector{TimeZones.Transition}:
1919-05-31T19:28:41 UTC+02:31:19/+2 (MDST)
1919-07-01T00:00:00 UTC+3/+1 (MSD)
1919-08-15T20:00:00 UTC+3/+0 (MSK)
In validating the read/write round-trip was easy to determine which time zones include some incompatibility between the tzfile format and our internal representation:
julia> function tzif_version(tz)
io = IOBuffer()
TZFile.write(seekstart(io), tz)
return TZFile.read(seekstart(io))(tz.name)
end
tzif_version (generic function with 1 method)
julia> tzif_tz_compatible(tz) = tzif_version(tz) == tz
tzif_tz_compatible (generic function with 1 method)
julia> filter(!tzif_tz_compatible, all_timezones())
113-element Vector{TimeZone}:
America/Argentina/Buenos_Aires (UTC-3)
America/Argentina/Catamarca (UTC-3)
America/Argentina/ComodRivadavia (UTC-3)
America/Argentina/Cordoba (UTC-3)
⋮
Israel (UTC+2/UTC+3)
Pacific/Rarotonga (UTC-10)
Portugal (UTC+0/UTC+1)
W-SU (UTC+3)
I'm going to forge ahead with this and make a release
As per #359 the plan was to use the
tzfile
format (or something similar) as the binary format we use to store time zone data as a simple binary serialization format. The changes here add that support add that support as well as change the currentread_tzfile
interface to beTZFile.read
.