JuliaTime / TimeZones.jl

IANA time zone database access for the Julia programming language
Other
86 stars 52 forks source link

Should FixedTimeZones have names? #326

Open oxinabox opened 3 years ago

oxinabox commented 3 years ago

Relating to #271 , if we removed the name field from the FixedTimeZone type then it would be isbits. And would use a bit less memory.

RIght now, as I understand the situation:

omus commented 3 years ago

As VariableTimeZones use FixedTimeZones as part of the Transitions removing the name field would result in no longer being able to access the abbreviations names commonly used in North America.

oxinabox commented 3 years ago

This is true. Though the only place this is exposed at all to the user in the public API is: show_next_transition

julia> show_next_transition(tz"America/New_York")
Transition Date:   2021-11-07
Local Time Change: 02:00 → 01:00 (Backward)
Offset Change:     UTC-5/+1 → UTC-5/+0
Transition From:   2021-11-07T01:59:59.999-04:00 (EDT)
Transition To:     2021-11-07T01:00:00.000-05:00 (EST)

would instread show up as:

julia> show_next_transition(tz"America/New_York")
Transition Date:   2021-11-07
Local Time Change: 02:00 → 01:00 (Backward)
Offset Change:     UTC-5/+1 → UTC-5/+0
Transition From:   2021-11-07T01:59:59.999-04:00
Transition To:     2021-11-07T01:00:00.000-05:00

The benifit is we would be saving 1 pointer (4 or 8 bytes depending on 32 or 64 bit) + about 16 bytes of content from every FixedTimeZone. and we would save hundreds of pointer from many VariableTimeZone.

julia> length(tz"America/New_York".transitions)
237

julia> length(tz"America/Winnipeg".transitions)
187

julia> length(tz"Australia/Perth".transitions)
20

as well as making it isbits to make the GC happy.

In contrast, making it isbits via ShortStrings as in #327 will remove the pointer from the FixedTimeZone so you only pay the 16 bytes of content. but julia will now inline its value into every ZonedDateTime and Transition it is used in. Rather than those only containing pointers, now they will contain the real object. So those 16 bytes of content do start to add up over millions of ZonedDateTimes/Transition. Though given you are still saving the pointer in the ZonedDateTime/Transition, they only add net 8 or 12 bytes (64 or 32 bit sytsem) each instance

omus commented 3 years ago

I agree we can do better but I'd rather not lose information that is part of the tzdata. There are a few alternate ways we could be storing this data. One of these is the tzfile format which TimeZones.jl already supports reading.

Additionally, the abbreviations used by transitions "Use three to six characters that are ASCII alphanumerics or '+' or '-'." – https://data.iana.org/time-zones/theory.html#abbreviations