JuliaTime / TimeZones.jl

IANA time zone database access for the Julia programming language
Other
87 stars 52 forks source link

Make FixedTimeZone isbits #354

Closed Wynand closed 3 years ago

Wynand commented 3 years ago

This switches FixedTimeZone.name to an InlineString instead of a String, and doing that makes it an isbits type

As a side effect this also limits the number of bytes in a FixedTimeZone name

Usually timezones are named after locations, and the longest location name currently is Taumatawhakatangihangakoauauotamateaturipukakapikimaungahoronukupokaiwhenuakitanatahu. This is in New Zealand, so if a fixed time zone were created for it then it would be Oceania/Taumatawhakatangihangakoauauotamateaturipukakapikimaungahoronukupokaiwhenuakitanatahu

That is still only 93 characters, so the 255 character limit should be fine for a while, and we could switch to InlineString127 if needed as well

Related issue: https://github.com/JuliaTime/TimeZones.jl/issues/271

Wynand commented 3 years ago

Depends on: https://github.com/JuliaData/WeakRefStrings.jl/pull/78

omus commented 3 years ago

Thanks, this seems promising. Out of curiosity do you also want VariableTimeZone to be an isbitstype or is your priority just FixedTimeZone?

Wynand commented 3 years ago

Thanks, this seems promising. Out of curiosity do you also want VariableTimeZone to be an isbitstype or is your priority just FixedTimeZone?

No problem!

For now I'm focusing on FixedTimeZone since VariableTimeZone still has the transitions vector, however Transition will also be an isbitstype once FixedTimeZone is an isbitstype, which should speed up garbage collection

nickrobinson251 commented 3 years ago

That is still only 93 characters, so the 255 character limit should be fine for a while, and we could switch to InlineString127 if needed as well

I think we should probably switch to InlineString127 Having every FixedTimeZone take up an extra 128 bytes unnecessarily is gonna add up (especially compared to the current size e.g. sizeof(tz"UTC") == 24)

I also wonder if there's anyway to use even shorter InlineStrings... πŸ€”

omus commented 3 years ago

For now I'm focusing on FixedTimeZone since VariableTimeZone still has the transitions vector, however Transition will also be an isbitstype once FixedTimeZone is an isbitstype, which should speed up garbage collection

Great! I think using InlineStrings is a good stepping stone towards that goal.

I also wonder if there's anyway to use even shorter InlineStrings... πŸ€”

Funny you should ask... As an experiment I implemented a TZAbbr string-type which should be minimize the storage cost for the Transition vector: https://github.com/JuliaTime/TimeZones.jl/pull/355

I still think we should proceed with using InlineString as proposed in this PR as the TZAbbr is more specialized and will take some additional work to integrate.

Wynand commented 3 years ago

Worth noting: This will cause issues with projects using CSV.jl until it's next release, since v0.8.5 doesn't support WeakRefStrings 1.2

omus commented 3 years ago

Worth noting: This will cause issues with projects using CSV.jl until it's next release, since v0.8.5 doesn't support WeakRefStrings 1.2

It's doubtful anyone will explicitly require this latest release so falling back to a slightly older TimeZones.jl should be fine

omus commented 3 years ago

WeakRefStrings.jl has a Julia minimum of 1.3. All of its dependencies are set to use Julia 1 and the PR that bumped up the minimum was: https://github.com/JuliaData/WeakRefStrings.jl/pull/73. From looking over that PR it seems like WeakRefStrings.jl could be set to use julia = "1"

Wynand commented 3 years ago

I'm looking into the performance of this change but I haven't benchmarked with Julia before, so here's what I did and the results:

First I created a test script, test.jl:

using TimeZones, BenchmarkTools
BenchmarkTools.DEFAULT_PARAMETERS.gcsample = true

function bm_tzs()
    map( t -> TimeZone(t, TimeZones.Class(:ALL)), timezone_names())
    TimeZones._reset_tz_cache()
    GC.gc()
end

@benchmark bm_tzs()

For both master and my branch I switched to each like so:

> git checkout <branch name>
> git clean -dfX && julia --project=@. -e 'using Pkg; Pkg.build("TimeZones")'

Then I tested each with the following Julia commands:

julia --project=@. -t 1
include("test.jl")

Here are the results for master from one run:

BenchmarkTools.Trial: 9 samples with 1 evaluation.
 Range (min … max):  337.749 ms … 347.104 ms  β”Š GC (min … max): 20.89% … 22.29%
 Time  (median):     339.937 ms               β”Š GC (median):    21.09%
 Time  (mean Β± Οƒ):   340.773 ms Β±   2.992 ms  β”Š GC (mean Β± Οƒ):  21.18% Β±  0.56%

  β–ˆ  β–ˆ  β–ˆβ–ˆ      β–ˆ        β–ˆ  β–ˆ        β–ˆ                        β–ˆ  
  β–ˆβ–β–β–ˆβ–β–β–ˆβ–ˆβ–β–β–β–β–β–β–ˆβ–β–β–β–β–β–β–β–β–ˆβ–β–β–ˆβ–β–β–β–β–β–β–β–β–ˆβ–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–ˆ ▁
  338 ms           Histogram: frequency by time          347 ms <

 Memory estimate: 45.01 MiB, allocs estimate: 843644.

And from my branch:

BenchmarkTools.Trial: 16 samples with 1 evaluation.
 Range (min … max):  94.937 ms … 110.278 ms  β”Š GC (min … max): 57.50% … 50.64%
 Time  (median):     97.181 ms               β”Š GC (median):    57.46%
 Time  (mean Β± Οƒ):   97.859 ms Β±   3.467 ms  β”Š GC (mean Β± Οƒ):  57.23% Β±  1.85%

         β–ƒβ–ˆ  β–ƒ                                                  
  β–‡β–‡β–β–‡β–‡β–β–β–ˆβ–ˆβ–‡β–‡β–ˆβ–β–‡β–‡β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–‡ ▁
  94.9 ms         Histogram: frequency by time          110 ms <

 Memory estimate: 9.38 MiB, allocs estimate: 54047.

I'm going to follow up and make sure they use the same number of samples, but so far this looks good 🀷

Wynand commented 3 years ago

Reran with test.jl as

using TimeZones, BenchmarkTools
BenchmarkTools.DEFAULT_PARAMETERS.gcsample = true
BenchmarkTools.DEFAULT_PARAMETERS.seconds = 300
BenchmarkTools.DEFAULT_PARAMETERS.samples = 100

function bm_tzs()
    map( t -> TimeZone(t, TimeZones.Class(:ALL)), timezone_names())
    TimeZones._reset_tz_cache()
    GC.gc()
end

@benchmark bm_tzs()

Master results:

BenchmarkTools.Trial: 100 samples with 1 evaluation.
 Range (min … max):  278.357 ms … 318.445 ms  β”Š GC (min … max): 23.32% … 19.60%
 Time  (median):     289.653 ms               β”Š GC (median):    22.14%
 Time  (mean Β± Οƒ):   291.489 ms Β±   8.360 ms  β”Š GC (mean Β± Οƒ):  22.02% Β±  0.65%

     ▃▁▆ β–ˆ  β–ƒ β–† ▁▃▆▁ β–ƒ   ▁  ▁▃    ▁▁ β–ƒ ▁                         
  β–„β–β–„β–ˆβ–ˆβ–ˆβ–‡β–ˆβ–β–„β–ˆβ–‡β–ˆβ–‡β–ˆβ–ˆβ–ˆβ–ˆβ–‡β–ˆβ–„β–„β–‡β–ˆβ–‡β–„β–ˆβ–ˆβ–‡β–‡β–„β–β–ˆβ–ˆβ–„β–ˆβ–β–ˆβ–„β–β–„β–‡β–β–β–β–β–β–β–β–β–β–β–β–β–β–„β–β–β–β–β–„ β–„
  278 ms           Histogram: frequency by time          318 ms <

 Memory estimate: 45.01 MiB, allocs estimate: 843644.

this branch's results:

BenchmarkTools.Trial: 100 samples with 1 evaluation.
 Range (min … max):  94.168 ms … 100.176 ms  β”Š GC (min … max): 58.06% … 56.36%
 Time  (median):     96.990 ms               β”Š GC (median):    57.91%
 Time  (mean Β± Οƒ):   96.926 ms Β±   1.017 ms  β”Š GC (mean Β± Οƒ):  58.00% Β±  0.62%

                     β–‚β–†  β–†  β–„β–‚   β–„ β–ˆ β–†β–‚                         
  β–„β–β–β–β–„β–β–β–β–β–β–†β–β–β–†β–β–†β–†β–„β–„β–ˆβ–ˆβ–ˆβ–†β–ˆβ–ˆβ–β–ˆβ–ˆβ–†β–†β–ˆβ–ˆβ–ˆβ–ˆβ–„β–ˆβ–ˆβ–†β–„β–ˆβ–†β–†β–†β–„β–†β–„β–„β–„β–β–β–β–β–„β–β–„β–β–β–β–β–„ β–„
  94.2 ms         Histogram: frequency by time         99.7 ms <

 Memory estimate: 9.38 MiB, allocs estimate: 54047.
oxinabox commented 3 years ago

Usually timezones are named after locations, and the longest location name currently is Taumatawhakatangihangakoauauotamateaturipukakapikimaungahoronukupokaiwhenuakitanatahu. This is in New Zealand, so if a fixed time zone were created for it then it would beΒ Oceania/Taumatawhakatangihangakoauauotamateaturipukakapikimaungahoronukupokaiwhenuakitanatahu

That is still only 93 characters, so the 255 character limit should be fine for a while, and we could switch to InlineString127 if needed as well

I think we can do better.

VariableTimeZones are named after locations. FixedTimeZones have codes like UTC+0800 See the FIXED_TIMEZONE_REGEX.

I found before they all fit a ShortString15.

Wynand commented 3 years ago

I found before they all fit a ShortString15

Switched to InlineString15, here are the results if I test without specifying -t:

enchmarkTools.Trial: 100 samples with 1 evaluation.
 Range (min … max):   97.166 ms … 198.882 ms  β”Š GC (min … max): 60.96% … 29.90%
 Time  (median):     105.430 ms               β”Š GC (median):    60.54%
 Time  (mean Β± Οƒ):   108.828 ms Β±  13.095 ms  β”Š GC (mean Β± Οƒ):  60.96% Β±  3.95%

  β–β–ƒβ–ˆβ–ˆβ–ƒβ–ƒβ–†β–† β–ƒ   β–† β–ƒ   ▁▁▁▃                ▁                       
  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‡β–ˆβ–‡β–‡β–„β–ˆβ–„β–ˆβ–‡β–‡β–„β–ˆβ–ˆβ–ˆβ–ˆβ–„β–β–„β–β–„β–β–β–„β–β–β–‡β–β–„β–β–β–„β–ˆβ–β–‡β–β–„β–‡β–β–β–„β–„β–‡β–β–β–β–„β–β–β–β–β–β–β–‡ β–„
  97.2 ms          Histogram: frequency by time          135 ms <

 Memory estimate: 5.04 MiB, allocs estimate: 53858.
Wynand commented 3 years ago

I can't think of any more code changes or cleanup needed, so if everything looks good this can be merged

iamed2 commented 3 years ago

Looks like this PR effectively lower-bounds Julia to 1.3 through the InlineStrings dependency, so it seems it has the same problems as the old WeakRefStrings. This means that CI won't run as-is.

iamed2 commented 3 years ago

I can't think of any more code changes or cleanup needed, so if everything looks good this can be merged

This should go without saying but CI should be passing before merge

Wynand commented 3 years ago

Looks like this PR effectively lower-bounds Julia to 1.3 through the InlineStrings dependency, so it seems it has the same problems as the old WeakRefStrings. This means that CI won't run as-is.

I've switched to ShortStrings instead, which does support Julia 1.0. The only quirk so far is with regex matching, but I've also only tested locally with Julia 1.6

codecov-commenter commented 3 years ago

Codecov Report

Merging #354 (76b1fdd) into master (6eb9788) will decrease coverage by 1.26%. The diff coverage is 100.00%.

:exclamation: Current head 76b1fdd differs from pull request most recent head bf015db. Consider uploading reports for the commit bf015db to get more accurate results Impacted file tree graph

@@            Coverage Diff             @@
##           master     #354      +/-   ##
==========================================
- Coverage   93.75%   92.48%   -1.27%     
==========================================
  Files          31       31              
  Lines        1553     1477      -76     
==========================================
- Hits         1456     1366      -90     
- Misses         97      111      +14     
Impacted Files Coverage Ξ”
src/TimeZones.jl 100.00% <ΓΈ> (ΓΈ)
src/types/fixedtimezone.jl 100.00% <100.00%> (ΓΈ)
src/winzone/WindowsTimeZoneIDs.jl 0.00% <0.00%> (-100.00%) :arrow_down:
src/build.jl 83.33% <0.00%> (-16.67%) :arrow_down:
src/tzdata/version.jl 57.14% <0.00%> (-3.39%) :arrow_down:
src/arithmetic.jl 92.00% <0.00%> (-0.31%) :arrow_down:
src/tzdata/download.jl 90.62% <0.00%> (-0.29%) :arrow_down:
src/utils.jl 97.56% <0.00%> (-0.27%) :arrow_down:
src/tzdata/timeoffset.jl 94.11% <0.00%> (-0.17%) :arrow_down:
src/types/zoneddatetime.jl 96.00% <0.00%> (-0.16%) :arrow_down:
... and 10 more

Continue to review full report at Codecov.

Legend - Click here to learn more Ξ” = absolute <relative> (impact), ΓΈ = not affected, ? = missing data Powered by Codecov. Last update 6eb9788...bf015db. Read the comment docs.

Wynand commented 3 years ago

I messed up updating the project.toml, which I think is what caused the benchmarking failure

I also re-ran performance tests after switching to ShortStrings, and here are the results:

BenchmarkTools.Trial: 100 samples with 1 evaluation.
 Range (min … max):   72.273 ms … 325.138 ms  β”Š GC (min … max): 61.92% … 17.33%
 Time  (median):      95.901 ms               β”Š GC (median):    58.95%
 Time  (mean Β± Οƒ):   101.141 ms Β±  28.220 ms  β”Š GC (mean Β± Οƒ):  55.53% Β±  6.93%

   ▁      β–„β–‡β–ˆβ–‡β–…                                                  
  β–‡β–ˆβ–…β–‡β–…β–β–β–β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‡β–β–‡β–…β–β–…β–…β–β–‡β–…β–ˆβ–…β–β–β–β–…β–…β–β–β–β–β–β–β–β–…β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–… β–…
  72.3 ms       Histogram: log(frequency) by time        206 ms <

 Memory estimate: 5.04 MiB, allocs estimate: 53952.
iamed2 commented 3 years ago

Not sure but I think there may also be a code bug in the benchmarks based on this error: https://github.com/JuliaTime/TimeZones.jl/pull/354/checks?check_run_id=3828334686#step:5:263

oxinabox commented 3 years ago

Looks like this PR effectively lower-bounds Julia to 1.3 through the InlineStrings dependency, so it seems it has the same problems as the old WeakRefStrings. This means that CI won't run as-is.

I've switched to ShortStrings instead, which does support Julia 1.0. The only quirk so far is with regex matching, but I've also only tested locally with Julia 1.6

I would rather not, we are wanting to deprecate ShortStrings.jl for InlineStrings.

I wonder why InlineStrings doesn't support 1.0 https://github.com/JuliaData/InlineStrings.jl/issues/9

Wynand commented 3 years ago

Not sure but I think there may also be a code bug in the benchmarks based on this error: https://github.com/JuliaTime/TimeZones.jl/pull/354/checks?check_run_id=3828334686#step:5:263

My best guess is that the older version of TimeZones is being built (in the compiled folder), and that is being used when benchmarking, which causes deserializing from those files to fail. To work around this I've pushed a change that moves checkout to before package building, similar to the julia tests in the package

Wynand commented 3 years ago

This should resolve the benchmarking issue: https://github.com/JuliaTime/TimeZones.jl/pull/360

I've tested running the CI code locally, replicated the issue, and fixed it by using this code as the base. I'll add more details on how I tested it in this MR

oxinabox commented 3 years ago

@omus can we revert the commit changing to ShortStrings and drop 1.0 support instead?

omus commented 3 years ago

@omus can we revert the commit changing to ShortStrings and drop 1.0 support instead?

I'm okay with that. We should drop 1.0 support in a separate PR and be sure to remove all of the VERSION specific code.

oxinabox commented 3 years ago

I'm okay with that. We should drop 1.0 support in a separate PR and be sure to remove all of the VERSION specific code.

I will make that PR presently

Wynand commented 3 years ago

Reran the custom benchmark script

This branch:

BenchmarkTools.Trial: 100 samples with 1 evaluation.
 Range (min … max):  92.906 ms … 128.399 ms  β”Š GC (min … max): 59.50% … 54.53%
 Time  (median):     97.701 ms               β”Š GC (median):    59.53%
 Time  (mean Β± Οƒ):   98.881 ms Β±   5.541 ms  β”Š GC (mean Β± Οƒ):  59.13% Β±  0.91%

  β–‚β–ƒβ–ˆβ–ˆβ–‚ β–‚β–‚β–‚   β–ƒβ–‚  β–…β–ƒ                                            
  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–„β–ˆβ–ˆβ–ˆβ–…β–„β–ˆβ–ˆβ–‡β–β–„β–‡β–„β–β–β–β–β–„β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–„ β–„
  92.9 ms         Histogram: frequency by time          127 ms <

 Memory estimate: 5.04 MiB, allocs estimate: 53912.

master:

BenchmarkTools.Trial: 100 samples with 1 evaluation.
 Range (min … max):  265.216 ms … 308.959 ms  β”Š GC (min … max): 22.84% … 20.03%
 Time  (median):     275.566 ms               β”Š GC (median):    22.41%
 Time  (mean Β± Οƒ):   278.183 ms Β±   8.808 ms  β”Š GC (mean Β± Οƒ):  22.28% Β±  0.63%

        β–ˆβ–ˆ ▁▃▃ ▃▁   ▁          β–ƒ                                 
  β–„β–β–„β–β–‡β–†β–ˆβ–ˆβ–‡β–ˆβ–ˆβ–ˆβ–„β–ˆβ–ˆβ–‡β–„β–‡β–ˆβ–†β–„β–„β–‡β–‡β–†β–‡β–†β–β–„β–ˆβ–β–β–β–‡β–„β–‡β–β–‡β–β–„β–„β–β–β–β–β–β–β–β–„β–β–β–β–β–β–β–„β–β–β–β–β–„ β–„
  265 ms           Histogram: frequency by time          307 ms <

 Memory estimate: 45.07 MiB, allocs estimate: 844893.
omus commented 3 years ago

I'll be making the 1.6.0 release today