JuliaTime / TimeZones.jl

IANA time zone database access for the Julia programming language
Other
86 stars 51 forks source link

Minimize allocations when unpacking `TimeZone`s from cache #423

Closed nickrobinson251 closed 1 month ago

nickrobinson251 commented 1 year ago

Even though we're careful about caching the creation of TimeZones and made FixedTimeZone an isbitstype/allocatedinline, we still allocate everytime TimeZone is called, even if just returning a FixedTimeZone from the cache!

I think this happens because the values in the cache are of type TimeZone, i.e. it's not known if we're returning a FixedTimeZone or a VariableTimeZone.

This PR proposes to instead maintain separate caches (per thread) for FixedTimeZones and VariableTimeZones, rather than a single one (per thread) for all TimeZones

Effectively this is some code duplication for a performance improvement.

VariableTimeZone still allocates once, presumably since it's not isbitstype (#271).

Micro benchmarks

FixedTimeZone

This PR

julia> str = "UTC";

julia> TimeZone(str);  # compile and cache

julia> @benchmark TimeZone($str)
BenchmarkTools.Trial: 10000 samples with 987 evaluations.
 Range (min … max):  46.733 ns … 156.619 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     53.994 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   54.265 ns ±   2.880 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

                   ▂     █▂     █▃
  ▂▁▂▂▁▂▂▂▂▂▂▂▂▂▂▂▂█▅▄▄▄▅██▅▆▇████▅▄▃▃▃▃▄▄▃▃▄▄▃▃▃▃▃▃▂▂▂▂▂▂▂▂▂▂ ▃
  46.7 ns         Histogram: frequency by time         61.8 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

master (v1.9.1)

julia> @benchmark TimeZone($str)
BenchmarkTools.Trial: 10000 samples with 943 evaluations.
 Range (min … max):  100.168 ns …  1.490 μs  ┊ GC (min … max): 0.00% … 92.27%
 Time  (median):     104.277 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   107.123 ns ± 48.091 ns  ┊ GC (mean ± σ):  1.61% ±  3.33%

             ▁█▆▂
  ▂▃▃▄▃▄▄▅▅▅▇████▅▄▃▃▃▄▅▅▆▇▇▇▅▅▄▃▃▃▂▂▂▂▂▂▁▁▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▃
  100 ns          Histogram: frequency by time          117 ns <

 Memory estimate: 64 bytes, allocs estimate: 2.

VariableTimeZone

This PR

julia> str = "America/Winnipeg";

julia> TimeZone(str);   # compile and cache

julia> @benchmark TimeZone($str)
BenchmarkTools.Trial: 10000 samples with 946 evaluations.
 Range (min … max):   98.441 ns …  1.144 μs  ┊ GC (min … max): 0.00% … 90.99%
 Time  (median):     101.039 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   102.185 ns ± 32.794 ns  ┊ GC (mean ± σ):  1.01% ±  2.87%

                ▅▄█▆▁
  ▁▂▃▄▄▆█▇▆▄▄▄▄▇█████▄▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂
  98.4 ns         Histogram: frequency by time          108 ns <

 Memory estimate: 48 bytes, allocs estimate: 1.

master (v1.9.1)

julia> @benchmark TimeZone($str)
BenchmarkTools.Trial: 10000 samples with 909 evaluations.
 Range (min … max):  119.179 ns …  1.722 μs  ┊ GC (min … max): 0.00% … 92.34%
 Time  (median):     120.187 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   123.116 ns ± 47.667 ns  ┊ GC (mean ± σ):  1.33% ±  3.17%

  ▁▅██▇▆▄▃▂▂▁▁▁▁▂▃▃▂▃▃▂▁▁▁▁▁▁▁                                 ▂
  ████████████████████████████▇▇███▇▇██▇▇▇▇▇▇▇▇▇█▆▇▅▇▆▆▆▆▆▅▆▅▄ █
  119 ns        Histogram: log(frequency) by time       134 ns <

 Memory estimate: 64 bytes, allocs estimate: 2.
nickrobinson251 commented 1 year ago
~/r/TimeZones.jl> TZDATA_VERSION=2016j julia --project=benchmark/ -e 'using PkgBenchmark, TimeZones; export_markdown(stdout, judge(TimeZones, "origin/HEAD", verbose=false))'
PkgBenchmark: Running benchmarks...
[ Info: Installing 2016j tzdata region data
[ Info: Converting tz source files into TimeZone data
PkgBenchmark: creating benchmark tuning file /Users/nickr/repos/TimeZones.jl/benchmark/tune.json...
Tuning 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| Time: 0:00:36
Benchmarking 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| Time: 0:00:43
PkgBenchmark: Running benchmarks...
[ Info: Installing 2016j tzdata region data
[ Info: Converting tz source files into TimeZone data
PkgBenchmark: using benchmark tuning data in /Users/nickr/repos/TimeZones.jl/benchmark/tune.json
Benchmarking 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| Time: 0:00:48

Benchmark Report for ~/repos/TimeZones.jl

Job Properties

Results

A ratio greater than 1.0 denotes a possible regression (marked with :x:), while a ratio less than 1.0 denotes a possible improvement (marked with :white_check_mark:). Only significant results - results that indicate possible regressions or improvements - are shown below (thus, an empty table means that all benchmark results remained invariant between builds).

ID time ratio memory ratio
["ZonedDateTime", "local", "standard"] 0.84 (5%) :white_check_mark: 1.00 (1%)
["ZonedDateTime", "range", "VariableTimeZone/DatePeriod"] 0.91 (5%) :white_check_mark: 1.00 (1%)
["arithmetic", "DatePeriod"] 0.93 (5%) :white_check_mark: 1.00 (1%)
["arithmetic", "TimePeriod"] 1.09 (5%) :x: 1.00 (1%)
["arithmetic", "broadcast", "FixedTimeZone/TimePeriod"] 0.94 (5%) :white_check_mark: 1.00 (1%)
["arithmetic", "broadcast", "VariableTimeZone/DatePeriod"] 0.93 (5%) :white_check_mark: 1.00 (1%)
["interpret", "local", "ambiguous"] 0.87 (5%) :white_check_mark: 1.00 (1%)
["interpret", "local", "standard"] 0.89 (5%) :white_check_mark: 1.00 (1%)
["parse", "ISOZonedDateTimeFormat"] 0.82 (5%) :white_check_mark: 0.87 (1%) :white_check_mark:
["parse", "issue#25"] 0.92 (5%) :white_check_mark: 0.88 (1%) :white_check_mark:
["transition_range", "local", "ambiguous"] 0.83 (5%) :white_check_mark: 1.00 (1%)
["transition_range", "local", "non-existent"] 0.79 (5%) :white_check_mark: 1.00 (1%)
["transition_range", "local", "standard"] 0.81 (5%) :white_check_mark: 1.00 (1%)
["transition_range", "utc"] 1.43 (5%) :x: 1.00 (1%)
["tryparsenext_fixedtz", "+06"] 0.94 (5%) :white_check_mark: 1.00 (1%)
["tryparsenext_fixedtz", "+0600"] 0.94 (5%) :white_check_mark: 1.00 (1%)
["tryparsenext_fixedtz", "+06:00"] 0.95 (5%) :white_check_mark: 1.00 (1%)
["tryparsenext_fixedtz", "-06"] 0.93 (5%) :white_check_mark: 1.00 (1%)
["tryparsenext_fixedtz", "-0600"] 0.95 (5%) :white_check_mark: 1.00 (1%)
["tryparsenext_fixedtz", "UTC"] 0.73 (5%) :white_check_mark: 1.00 (1%)
["tryparsenext_fixedtz", "Z"] 0.56 (5%) :white_check_mark: 1.00 (1%)
["tz_data", "parse_components"] 0.89 (5%) :white_check_mark: 0.84 (1%) :white_check_mark:

Benchmark Group List

Here's a list of all the benchmark groups executed by this job:

Julia versioninfo

Target

Julia Version 1.8.2
Commit 36034abf260 (2022-09-29 15:21 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin21.4.0)
  uname: Darwin 21.4.0 Darwin Kernel Version 21.4.0: Mon Feb 21 20:35:58 PST 2022; root:xnu-8020.101.4~2/RELEASE_ARM64_T6000 x86_64 i386
  CPU: Apple M1 Max:
                 speed         user         nice          sys         idle          irq
       #1-10  2400 MHz    1580681 s          0 s    1469385 s   30832641 s          0 s
  Memory: 64.0 GB (2506.71875 MB free)
  Uptime: 446789.0 sec
  Load Avg:  2.73779296875  2.6376953125  2.58837890625
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-13.0.1 (ORCJIT, westmere)
  Threads: 1 on 10 virtual cores

Baseline


Julia Version 1.8.2
Commit 36034abf260 (2022-09-29 15:21 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin21.4.0)
  uname: Darwin 21.4.0 Darwin Kernel Version 21.4.0: Mon Feb 21 20:35:58 PST 2022; root:xnu-8020.101.4~2/RELEASE_ARM64_T6000 x86_64 i386
  CPU: Apple M1 Max:
                 speed         user         nice          sys         idle          irq
       #1-10  2400 MHz    1581658 s          0 s    1469753 s   30837878 s          0 s
  Memory: 64.0 GB (2527.17578125 MB free)
  Uptime: 446854.0 sec
  Load Avg:  2.205078125  2.5078125  2.54345703125
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-13.0.1 (ORCJIT, westmere)
  Threads: 1 on 10 virtual cores
nickrobinson251 commented 1 year ago

Re-ran the benchmarks and the ones that are ❌ above are :white_check_mark: on re-run (and others are now not significant), so i think they're very noisy? Especially as most of these don't look like they even hit the codepaths changed in this PR?

Anyway, from this doesn't seem like we can say the two caches cause any performance issue

NHDaly commented 1 year ago

I'm not sure i understand why the VariableTimeZones still alloc when you look them up. I agree with you it is unexpected.. I'm not sure the isbits thing should matter, since we're pointing to the existing vector, not allocating a new one, and Julia is supposed to be able to stack-allocate an immutable struct that contains mutable fields, since like 1.4 or something. So i'm not sure i understand yet; it might be good to look through a profile together, once the rest of the issues in the PR are addressed.

nickrobinson251 commented 1 year ago

Okay, i've spent a couple days playing around with different options but i can't yet see how to remove the remaining VariableTimeZones allocation, so i propose leaving that as a follow-up. And likewise i don't think a switch to MultiThreadedCaches.jl is straight forward, so would prefer to get this perf improvement in and leave that refactor to a follow-up.

@NHDaly and @omus please could you take another look at this PR? Thanks!

omus commented 1 year ago

I'll try to take a look soon. I'll call out that I also want to get #382 merged soon as well.

codecov-commenter commented 1 year ago

Codecov Report

Merging #423 (3218a4e) into master (804864c) will decrease coverage by 0.47%. The diff coverage is 98.41%.

:exclamation: Current head 3218a4e differs from pull request most recent head b158178. Consider uploading reports for the commit b158178 to get more accurate results

:mega: This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

@@            Coverage Diff             @@
##           master     #423      +/-   ##
==========================================
- Coverage   95.58%   95.11%   -0.47%     
==========================================
  Files          38       36       -2     
  Lines        1766     1803      +37     
==========================================
+ Hits         1688     1715      +27     
- Misses         78       88      +10     
Impacted Files Coverage Δ
src/TimeZones.jl 100.00% <ø> (ø)
src/types/timezone.jl 98.64% <98.36%> (+1.50%) :arrow_up:
src/tzdata/compile.jl 94.78% <100.00%> (-0.97%) :arrow_down:

... and 9 files with indirect coverage changes

omus commented 1 year ago

I've been a bit swamped this week. Overall, I'm in favour of this change and will definitely merge this before #382

nickrobinson251 commented 1 year ago

Just following up to make sure this doesn't drop off your radar. Let me know if i can do anything to help, @omus :)

omus commented 1 year ago

It turned out #382 became urgent to merge for Julia 1.9 so the original plan had changed. If @nickrobinson251 you have time to rebase this against the latest version of the code that would be great. If not, I'll try to adapt this code myself sometime.

PallHaraldsson commented 9 months ago

Is this just a (simple) rebase away? It seems good as is, down to 0 allocations is great, already an improvement. Getting rid of the 1 remaining alloc for VariableTimeZone would be ideal, but shouldn't stop from merging as is? I didn't look closely into why it's there, maybe fixed already on 1.10?

nickrobinson251 commented 9 months ago

I think this is just a (not-very-simple) rebase away. And I would love, love, love someone to take that on, as I don't have capacity at the moment -- please feel encouraged to do so!

omus commented 1 month ago

Superseded by #451