JuliaTime / TimeZones.jl

IANA time zone database access for the Julia programming language
Other
86 stars 51 forks source link

Switch to pre-fetching model for thread-safety #382

Closed omus closed 1 year ago

omus commented 2 years ago

While working on #359 I took a look over our caching logic for compiled tzdata. We are currently using a cache per-thread to avoid situations where read/writes for the same entry would occur concurrently. This approach ended up being the fastest as it avoided using any locks but does introduce some memory bloat as we have a cache per-thread.

As the compiled tzdata isn't all that large it seems more sensible to load the entirety of the contents into memory upon package initialization and have all threads just read from a single cache. The situation where we would want to write to this cache is rather rare so this seems like a simpler approach. The one issue I can see is that the only other time we mutate the cache is when TimeZones.build is called. As this function really only used to: solve build failures or experiment with other tzdata versions it seems reasonable to mark this function as thread-unsafe and call it a day.

omus commented 2 years ago

@NHDaly let me know what you think of this alternate approach.

codecov-commenter commented 2 years ago

Codecov Report

Merging #382 (463304d) into master (804864c) will decrease coverage by 0.02%. The diff coverage is 100.00%.

:exclamation: Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more.

@@            Coverage Diff             @@
##           master     #382      +/-   ##
==========================================
- Coverage   95.58%   95.57%   -0.02%     
==========================================
  Files          38       38              
  Lines        1766     1762       -4     
==========================================
- Hits         1688     1684       -4     
  Misses         78       78              
Impacted Files Coverage Δ
src/TimeZones.jl 100.00% <100.00%> (ø)
src/build.jl 100.00% <100.00%> (ø)
src/types/timezone.jl 93.93% <100.00%> (-3.21%) :arrow_down:
src/tzdata/compile.jl 96.05% <100.00%> (+0.30%) :arrow_up:
omus commented 1 year ago

Single nightly failure is:

 [3956] signal (11.1): Segmentation fault: 11
in expression starting at /Users/runner/work/TimeZones.jl/TimeZones.jl/test/thread-safety.jl:1

Time to remove these tests

omus commented 1 year ago

@NHDaly let me know if you have any feedback

NHDaly commented 1 year ago

Yeah from the description that sounds excellent!

@nickrobinson251 told me that we can also construct new timezones if the user constructs a custom timezone or something? is that something we need to consider?

Otherwise, if not, i FULLY agree that preloading all of these makes sense!


One other possibility to add is that maybe you can make this an optional prefetch() / init() method that users can call, and they can optionally provide which timezones to load, if some users don't want to load all of them?

But that seems like overkill and i think i agree this sounds great to me! I haven't looked at the code yet, but the description sounds good

omus commented 1 year ago

@nickrobinson251 told me that we can also construct new timezones if the user constructs a custom timezone or something? is that something we need to consider?

It is possible that users can create custom FixedTimeZone or VariableTimeZone. That is still supported by TimeZones.jl but these time zones will no longer be automatically cached. More accurately, in older versions of TimeZones.jl if you were to specify say TimeZone("UTC+4") a FixedTimeZone would be created which would be entered into the cache. However, any instances of FixedTimeZone and VariableTimeZone created directly through their constructors wouldn't be added to the cache. Most likely these instances would be referenced via a constant and caching here is only useful for avoiding the deserialization cost which wouldn't exist for these time zones anyway.

One other possibility to add is that maybe you can make this an optional prefetch() / init() method that users can call, and they can optionally provide which timezones to load, if some users don't want to load all of them?

That's a reasonable idea if people really want to minimize the memory footprint or initialization time of the package. Since we're talking about KB here I think we can always add this functionality in later.

NHDaly commented 1 year ago

thanks, that's a helpful clarification. This looks good then! :)

omus commented 1 year ago

I've self-reviewed this code again and it seems like a better path forward for handling thread safety. I'm re-running the tests again here to verify things are working with the latest version of Julia and then will proceed with a new release.