344, by sharing TimeZones

across all thread-local caches (so there's only ever one TimeZone object instance created per process), and reduces redundant work caused by multiple concurrent Tasks starting to construct the same TimeZone at the same time.

Before this commit, multiple Tasks that try to construct the same TimeZone on different threads would have constructed the same object multiple times. Whereas now, they would share it, via the "Future".

It's difficult to show this effect, but one way to show it is by logging when a TimeZone is constructed, via this diff:

diff --git a/src/types/timezone.jl b/src/types/timezone.jl
index 25d36c3..1cea69e 100644
--- a/src/types/timezone.jl
+++ b/src/types/timezone.jl
@@ -68,6 +68,7 @@ function TimeZone(str::AbstractString, mask::Class=Class(:DEFAULT))
     # Note: If the class `mask` does not match the time zone we'll still load the
     # information into the cache to ensure the result is consistent.
     tz, class = get!(_tz_cache(), str) do
+        Core.println("CONSTRUCTING $str")
         tz_path = joinpath(TZData.COMPILED_DIR, split(str, "/")...)

         if isfile(tz_path)

Before this commit, you can see that every thread constructs the object twice - once before the clear and once after (total of 8 times):

julia> Threads.nthreads()
4

julia> TimeZones.TZData.compile();

julia> foo() = begin
           @sync for i in 1:20
               if (i == 10)
                   @info "---------clear-------"
                   TimeZones.TZData.compile()
               end
               Threads.@Spawn begin
                   TimeZone("US/Eastern", TimeZones.Class(:LEGACY))
               end
           end
           @info "done"
       end
foo (generic function with 1 method)

julia> @time foo()
[ Info: ---------clear-------
CONSTRUCTING US/Eastern
CONSTRUCTING US/Eastern
CONSTRUCTING US/Eastern
CONSTRUCTING US/Eastern
CONSTRUCTING US/Eastern
CONSTRUCTING US/Eastern
CONSTRUCTING US/Eastern
CONSTRUCTING US/Eastern
[ Info: done
  0.391298 seconds (1.51 M allocations: 64.544 MiB, 2.46% gc time, 0.00% compilation time)

After this commit, it's constructed only twice - once before the clear and once after:

julia> @time foo()
[ Info: ---------clear-------
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: done
  0.414059 seconds (1.46 M allocations: 61.972 MiB, 4.55% gc time)

Finally, the other problem this avoids is if we ever accidentally introduce a Task yield inside the constructor, which could happen if we used @info instead of Core.println(), then without this PR, the old code could potentially do redundant work to construct the TimeZone multiple times - even on the same thread - since each Task's constructor would see that there's no TZ in the cache, start the work, and then yield to the next Task, which would do the same, until finally they all report their work into the cache, overwriting each other.

This is what happens if we use @info in the above diff, instead:

Before this commit - the results are nondeterministic, but in this run, you can see it redundantly constructed the value all 20 times!:

julia> @time foo()
[ Info: ---------clear-------
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: done
  0.494492 seconds (1.55 M allocations: 66.754 MiB, 16.67% gc time)

After this commit, just the two we expect. 😊 :

julia> @time foo()
[ Info: ---------clear-------
[ Info: CONSTRUCTING US/Eastern
[ Info: CONSTRUCTING US/Eastern
[ Info: done
  0.422677 seconds (1.47 M allocations: 62.228 MiB, 4.66% gc time)

NHDaly commented 3 years ago

Sorry the diff is hard to read; it'll be easier after https://github.com/JuliaTime/TimeZones.jl/pull/344/ is merged.

The entirety of this PR is just in this single commit, here: bd68cf9 You can view the diff by clicking that. :)

I think this PR is maybe worth merging separately, since it's nice to see the performance difference and keep it separate from the correctness of the previous PR. Also, we'll want to measure the CPU perf impact to make sure that it's okay before merging this one. I'm not sure how you were measuring it, @omus - can you run that measurement again, for consistency? Thanks!

CC: @sacha0, @omus.

NHDaly commented 3 years ago

@omus: I think this PR is ready for review whenever you get time. if there's anyone else who you'd prefer to do the review, just let me know. 👍 🙁

NHDaly commented 2 years ago

Just wanted to call out that the TimeZone mutex changes is essentially equivalent to a readers/writers lock similar to this implementation: 6522f7a. I'm just calling this out as a possible cleanup to this implementation in the future if a well tested readers/writer lock package emerges.

FWIW, i still think this will be more performant than a reader/writer lock could be, since this kind of cache locking takes advantage of the cache being append-only, so that you don't need to lock at all on cache hits. I could imagine we could separate out this kind of thread-safe shared cache into a separate package though, so that others could benefit from this implementation 👍

omus commented 2 years ago

Kick starting CI

omus commented 2 years ago

@NHDaly sorry for not giving this the attention it deserves. Been swamped lately. I'll try to be faster on additional reviews

NHDaly commented 2 years ago

No worries at all - i fully believe that there are bugs in the implementation here.

We actually split this out into its own package, here: https://github.com/NHDaly/MultiThreadedCaches.jl

We found several bugs in the implementation while extracting it out, so the code is much cleaner now. I think this PR can wait until we scrub out all this and use that package instead! :)

It's also worth doing another comparison against a Read/Write Lock and also against a ConcurrentHashTable.

Feel free to review that package directly, if you'd like - most of the code was added in: https://github.com/NHDaly/MultiThreadedCaches.jl/pull/1, plus a couple follow ups.

Thanks for your patience @omus, sorry for the delay on my end. ALSO i'm about to go on vacation for three weeks, so 👋 we can talk about this again soon! 😅

JuliaTime / TimeZones.jl

Reduce memory & redundant work for concurrent TimeZones construction #356

344, by sharing TimeZones