Closed Wynand closed 3 years ago
Thanks, this seems promising. Out of curiosity do you also want VariableTimeZone
to be an isbitstype
or is your priority just FixedTimeZone
?
Thanks, this seems promising. Out of curiosity do you also want
VariableTimeZone
to be anisbitstype
or is your priority justFixedTimeZone
?
No problem!
For now I'm focusing on FixedTimeZone
since VariableTimeZone
still has the transitions
vector, however Transition
will also be an isbitstype
once FixedTimeZone
is an isbitstype
, which should speed up garbage collection
That is still only 93 characters, so the 255 character limit should be fine for a while, and we could switch to InlineString127 if needed as well
I think we should probably switch to InlineString127
Having every FixedTimeZone
take up an extra 128 bytes unnecessarily is gonna add up (especially compared to the current size e.g. sizeof(tz"UTC") == 24
)
I also wonder if there's anyway to use even shorter InlineStrings... π€
For now I'm focusing on
FixedTimeZone
sinceVariableTimeZone
still has thetransitions
vector, howeverTransition
will also be anisbitstype
onceFixedTimeZone
is anisbitstype
, which should speed up garbage collection
Great! I think using InlineString
s is a good stepping stone towards that goal.
I also wonder if there's anyway to use even shorter InlineStrings... π€
Funny you should ask... As an experiment I implemented a TZAbbr
string-type which should be minimize the storage cost for the Transition
vector: https://github.com/JuliaTime/TimeZones.jl/pull/355
I still think we should proceed with using InlineString
as proposed in this PR as the TZAbbr
is more specialized and will take some additional work to integrate.
Worth noting: This will cause issues with projects using CSV.jl until it's next release, since v0.8.5 doesn't support WeakRefStrings 1.2
Worth noting: This will cause issues with projects using CSV.jl until it's next release, since v0.8.5 doesn't support WeakRefStrings 1.2
It's doubtful anyone will explicitly require this latest release so falling back to a slightly older TimeZones.jl should be fine
WeakRefStrings.jl has a Julia minimum of 1.3. All of its dependencies are set to use Julia 1 and the PR that bumped up the minimum was: https://github.com/JuliaData/WeakRefStrings.jl/pull/73. From looking over that PR it seems like WeakRefStrings.jl could be set to use julia = "1"
I'm looking into the performance of this change but I haven't benchmarked with Julia before, so here's what I did and the results:
First I created a test script, test.jl
:
using TimeZones, BenchmarkTools
BenchmarkTools.DEFAULT_PARAMETERS.gcsample = true
function bm_tzs()
map( t -> TimeZone(t, TimeZones.Class(:ALL)), timezone_names())
TimeZones._reset_tz_cache()
GC.gc()
end
@benchmark bm_tzs()
For both master and my branch I switched to each like so:
> git checkout <branch name>
> git clean -dfX && julia --project=@. -e 'using Pkg; Pkg.build("TimeZones")'
Then I tested each with the following Julia commands:
julia --project=@. -t 1
include("test.jl")
Here are the results for master from one run:
BenchmarkTools.Trial: 9 samples with 1 evaluation.
Range (min β¦ max): 337.749 ms β¦ 347.104 ms β GC (min β¦ max): 20.89% β¦ 22.29%
Time (median): 339.937 ms β GC (median): 21.09%
Time (mean Β± Ο): 340.773 ms Β± 2.992 ms β GC (mean Β± Ο): 21.18% Β± 0.56%
β β ββ β β β β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
338 ms Histogram: frequency by time 347 ms <
Memory estimate: 45.01 MiB, allocs estimate: 843644.
And from my branch:
BenchmarkTools.Trial: 16 samples with 1 evaluation.
Range (min β¦ max): 94.937 ms β¦ 110.278 ms β GC (min β¦ max): 57.50% β¦ 50.64%
Time (median): 97.181 ms β GC (median): 57.46%
Time (mean Β± Ο): 97.859 ms Β± 3.467 ms β GC (mean Β± Ο): 57.23% Β± 1.85%
ββ β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
94.9 ms Histogram: frequency by time 110 ms <
Memory estimate: 9.38 MiB, allocs estimate: 54047.
I'm going to follow up and make sure they use the same number of samples, but so far this looks good π€·
Reran with test.jl
as
using TimeZones, BenchmarkTools
BenchmarkTools.DEFAULT_PARAMETERS.gcsample = true
BenchmarkTools.DEFAULT_PARAMETERS.seconds = 300
BenchmarkTools.DEFAULT_PARAMETERS.samples = 100
function bm_tzs()
map( t -> TimeZone(t, TimeZones.Class(:ALL)), timezone_names())
TimeZones._reset_tz_cache()
GC.gc()
end
@benchmark bm_tzs()
Master results:
BenchmarkTools.Trial: 100 samples with 1 evaluation.
Range (min β¦ max): 278.357 ms β¦ 318.445 ms β GC (min β¦ max): 23.32% β¦ 19.60%
Time (median): 289.653 ms β GC (median): 22.14%
Time (mean Β± Ο): 291.489 ms Β± 8.360 ms β GC (mean Β± Ο): 22.02% Β± 0.65%
βββ β β β ββββ β β ββ ββ β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
278 ms Histogram: frequency by time 318 ms <
Memory estimate: 45.01 MiB, allocs estimate: 843644.
this branch's results:
BenchmarkTools.Trial: 100 samples with 1 evaluation.
Range (min β¦ max): 94.168 ms β¦ 100.176 ms β GC (min β¦ max): 58.06% β¦ 56.36%
Time (median): 96.990 ms β GC (median): 57.91%
Time (mean Β± Ο): 96.926 ms Β± 1.017 ms β GC (mean Β± Ο): 58.00% Β± 0.62%
ββ β ββ β β ββ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
94.2 ms Histogram: frequency by time 99.7 ms <
Memory estimate: 9.38 MiB, allocs estimate: 54047.
Usually timezones are named after locations, and the longest location name currently is Taumatawhakatangihangakoauauotamateaturipukakapikimaungahoronukupokaiwhenuakitanatahu. This is in New Zealand, so if a fixed time zone were created for it then it would beΒ Oceania/Taumatawhakatangihangakoauauotamateaturipukakapikimaungahoronukupokaiwhenuakitanatahu
That is still only 93 characters, so the 255 character limit should be fine for a while, and we could switch to InlineString127 if needed as well
I think we can do better.
VariableTimeZones are named after locations. FixedTimeZones have codes like UTC+0800 See the FIXED_TIMEZONE_REGEX.
I found before they all fit a ShortString15.
I found before they all fit a ShortString15
Switched to InlineString15, here are the results if I test without specifying -t
:
enchmarkTools.Trial: 100 samples with 1 evaluation.
Range (min β¦ max): 97.166 ms β¦ 198.882 ms β GC (min β¦ max): 60.96% β¦ 29.90%
Time (median): 105.430 ms β GC (median): 60.54%
Time (mean Β± Ο): 108.828 ms Β± 13.095 ms β GC (mean Β± Ο): 60.96% Β± 3.95%
ββββββββ β β β ββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
97.2 ms Histogram: frequency by time 135 ms <
Memory estimate: 5.04 MiB, allocs estimate: 53858.
I can't think of any more code changes or cleanup needed, so if everything looks good this can be merged
Looks like this PR effectively lower-bounds Julia to 1.3 through the InlineStrings dependency, so it seems it has the same problems as the old WeakRefStrings. This means that CI won't run as-is.
I can't think of any more code changes or cleanup needed, so if everything looks good this can be merged
This should go without saying but CI should be passing before merge
Looks like this PR effectively lower-bounds Julia to 1.3 through the InlineStrings dependency, so it seems it has the same problems as the old WeakRefStrings. This means that CI won't run as-is.
I've switched to ShortStrings instead, which does support Julia 1.0. The only quirk so far is with regex matching, but I've also only tested locally with Julia 1.6
Merging #354 (76b1fdd) into master (6eb9788) will decrease coverage by
1.26%
. The diff coverage is100.00%
.:exclamation: Current head 76b1fdd differs from pull request most recent head bf015db. Consider uploading reports for the commit bf015db to get more accurate results
@@ Coverage Diff @@
## master #354 +/- ##
==========================================
- Coverage 93.75% 92.48% -1.27%
==========================================
Files 31 31
Lines 1553 1477 -76
==========================================
- Hits 1456 1366 -90
- Misses 97 111 +14
Impacted Files | Coverage Ξ | |
---|---|---|
src/TimeZones.jl | 100.00% <ΓΈ> (ΓΈ) |
|
src/types/fixedtimezone.jl | 100.00% <100.00%> (ΓΈ) |
|
src/winzone/WindowsTimeZoneIDs.jl | 0.00% <0.00%> (-100.00%) |
:arrow_down: |
src/build.jl | 83.33% <0.00%> (-16.67%) |
:arrow_down: |
src/tzdata/version.jl | 57.14% <0.00%> (-3.39%) |
:arrow_down: |
src/arithmetic.jl | 92.00% <0.00%> (-0.31%) |
:arrow_down: |
src/tzdata/download.jl | 90.62% <0.00%> (-0.29%) |
:arrow_down: |
src/utils.jl | 97.56% <0.00%> (-0.27%) |
:arrow_down: |
src/tzdata/timeoffset.jl | 94.11% <0.00%> (-0.17%) |
:arrow_down: |
src/types/zoneddatetime.jl | 96.00% <0.00%> (-0.16%) |
:arrow_down: |
... and 10 more |
Continue to review full report at Codecov.
Legend - Click here to learn more
Ξ = absolute <relative> (impact)
,ΓΈ = not affected
,? = missing data
Powered by Codecov. Last update 6eb9788...bf015db. Read the comment docs.
I messed up updating the project.toml, which I think is what caused the benchmarking failure
I also re-ran performance tests after switching to ShortStrings, and here are the results:
BenchmarkTools.Trial: 100 samples with 1 evaluation.
Range (min β¦ max): 72.273 ms β¦ 325.138 ms β GC (min β¦ max): 61.92% β¦ 17.33%
Time (median): 95.901 ms β GC (median): 58.95%
Time (mean Β± Ο): 101.141 ms Β± 28.220 ms β GC (mean Β± Ο): 55.53% Β± 6.93%
β βββββ
βββ
ββ
ββββββββββββ
ββ
β
βββ
ββ
ββββ
β
ββββββββ
βββββββββββββββββββββββ
β
72.3 ms Histogram: log(frequency) by time 206 ms <
Memory estimate: 5.04 MiB, allocs estimate: 53952.
Not sure but I think there may also be a code bug in the benchmarks based on this error: https://github.com/JuliaTime/TimeZones.jl/pull/354/checks?check_run_id=3828334686#step:5:263
Looks like this PR effectively lower-bounds Julia to 1.3 through the InlineStrings dependency, so it seems it has the same problems as the old WeakRefStrings. This means that CI won't run as-is.
I've switched to ShortStrings instead, which does support Julia 1.0. The only quirk so far is with regex matching, but I've also only tested locally with Julia 1.6
I would rather not, we are wanting to deprecate ShortStrings.jl for InlineStrings.
I wonder why InlineStrings doesn't support 1.0 https://github.com/JuliaData/InlineStrings.jl/issues/9
Not sure but I think there may also be a code bug in the benchmarks based on this error: https://github.com/JuliaTime/TimeZones.jl/pull/354/checks?check_run_id=3828334686#step:5:263
My best guess is that the older version of TimeZones is being built (in the compiled folder), and that is being used when benchmarking, which causes deserializing from those files to fail. To work around this I've pushed a change that moves checkout to before package building, similar to the julia tests in the package
This should resolve the benchmarking issue: https://github.com/JuliaTime/TimeZones.jl/pull/360
I've tested running the CI code locally, replicated the issue, and fixed it by using this code as the base. I'll add more details on how I tested it in this MR
@omus can we revert the commit changing to ShortStrings and drop 1.0 support instead?
@omus can we revert the commit changing to ShortStrings and drop 1.0 support instead?
I'm okay with that. We should drop 1.0 support in a separate PR and be sure to remove all of the VERSION
specific code.
I'm okay with that. We should drop 1.0 support in a separate PR and be sure to remove all of the VERSION specific code.
I will make that PR presently
Reran the custom benchmark script
This branch:
BenchmarkTools.Trial: 100 samples with 1 evaluation.
Range (min β¦ max): 92.906 ms β¦ 128.399 ms β GC (min β¦ max): 59.50% β¦ 54.53%
Time (median): 97.701 ms β GC (median): 59.53%
Time (mean Β± Ο): 98.881 ms Β± 5.541 ms β GC (mean Β± Ο): 59.13% Β± 0.91%
βββββ βββ ββ β
β
βββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββ β
92.9 ms Histogram: frequency by time 127 ms <
Memory estimate: 5.04 MiB, allocs estimate: 53912.
master:
BenchmarkTools.Trial: 100 samples with 1 evaluation.
Range (min β¦ max): 265.216 ms β¦ 308.959 ms β GC (min β¦ max): 22.84% β¦ 20.03%
Time (median): 275.566 ms β GC (median): 22.41%
Time (mean Β± Ο): 278.183 ms Β± 8.808 ms β GC (mean Β± Ο): 22.28% Β± 0.63%
ββ βββ ββ β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
265 ms Histogram: frequency by time 307 ms <
Memory estimate: 45.07 MiB, allocs estimate: 844893.
I'll be making the 1.6.0 release today
This switches FixedTimeZone.name to an InlineString instead of a String, and doing that makes it an isbits type
As a side effect this also limits the number of bytes in a FixedTimeZone name
Usually timezones are named after locations, and the longest location name currently is Taumatawhakatangihangakoauauotamateaturipukakapikimaungahoronukupokaiwhenuakitanatahu. This is in New Zealand, so if a fixed time zone were created for it then it would be
Oceania/Taumatawhakatangihangakoauauotamateaturipukakapikimaungahoronukupokaiwhenuakitanatahu
That is still only 93 characters, so the 255 character limit should be fine for a while, and we could switch to InlineString127 if needed as well
Related issue: https://github.com/JuliaTime/TimeZones.jl/issues/271