Open doorisajar opened 3 years ago
Unfortunately there are a variety of answers that could be generated when converting local DateTime
s into ZonedDateTime
. You can use the occurrence
argument of the ZonedDateTime
constructor to work around the problem you describe but it may not provide the answer you want:
julia> using Dates, TimeZones
julia> wpg = tz"America/Winnipeg"
America/Winnipeg (UTC-6/UTC-5)
julia> collect(ZonedDateTime(2020,11,1,wpg):Hour(1):ZonedDateTime(2020,11,1,2,wpg))
4-element Array{ZonedDateTime,1}:
2020-11-01T00:00:00-05:00
2020-11-01T01:00:00-05:00
2020-11-01T01:00:00-06:00
2020-11-01T02:00:00-06:00
julia> x = DateTime(2020,11,1):Hour(1):DateTime(2020,11,1,3)
DateTime("2020-11-01T00:00:00"):Hour(1):DateTime("2020-11-01T03:00:00")
julia> ZonedDateTime.(x, wpg)
ERROR: AmbiguousTimeError: Local DateTime 2020-11-01T01:00:00 is ambiguous within America/Winnipeg
...
julia> ZonedDateTime.(x, wpg, 1) # For an ambiguous case: select the first occurrence
4-element Array{ZonedDateTime,1}:
2020-11-01T00:00:00-05:00
2020-11-01T01:00:00-05:00
2020-11-01T02:00:00-06:00
2020-11-01T03:00:00-06:00
julia> ZonedDateTime.(x, wpg, 2) # For an ambiguous case: select the first occurrence
4-element Array{ZonedDateTime,1}:
2020-11-01T00:00:00-05:00
2020-11-01T01:00:00-06:00
2020-11-01T02:00:00-06:00
2020-11-01T03:00:00-06:00
Getting the output show by the original ZonedDateTime
example is harder. The best option is to use a range in that case. If you can provide a concrete example we may be able to come up with a solution.
I agree that it's not trivial, but if we know the time zone and know that the sequence is in order, we do have enough information to properly apply the variable time zone to each timestamp in the sequence.
Here's a short example that includes the spring ahead and fall back from last year. I included the spring ahead, just to make sure we don't break that while experimenting with handling fall back.
sa = vcat([DateTime("2020-03-08T00:00:00") + 30 * Minute(m) for m in 1:3], [DateTime("2020-03-08T02:30:00") + 30 * Minute(m) for m in 1:4])
fb = vcat([DateTime("2020-11-01T00:00:00") + 30 * Minute(m) for m in 1:4], [DateTime("2020-11-01T01:00:00") + 30 * Minute(m) for m in 1:4])
dts = vcat(sa, fb)
julia> dts
15-element Array{DateTime,1}:
2020-03-08T00:30:00
2020-03-08T01:00:00
2020-03-08T01:30:00
2020-03-08T03:00:00
2020-03-08T03:30:00
2020-03-08T04:00:00
2020-03-08T04:30:00
2020-11-01T00:30:00
2020-11-01T01:00:00
2020-11-01T01:30:00
2020-11-01T02:00:00
2020-11-01T01:30:00
2020-11-01T02:00:00
2020-11-01T02:30:00
2020-11-01T03:00:00
One could envision examples like this at any timeseries resolution, or with ragged timeseries. An option to address it might be to use the internal API to compare first_valid
and last_valid
and look at the changes in that sequence:
julia> fv = TimeZones.first_valid.(dts, tz)
julia> lv = TimeZones.last_valid.(dts, tz)
julia> fv .!= lv
15-element BitArray{1}:
0
0
0
0
0
0
0
0
1
1
0
1
0
0
0
[edit: removed an example I thought was working, but wasn't :) ]
Whether via cumulative sums or run length encoding (or other means), it should be possible to detect the regions of the sorted timeseries that need to receive special treatment, and apply the appropriate conversions in the appropriate places.
I'm not sure I understand the output of TimeZones.transition_range
, but that might be an option -- get the range of possible ambiguity, then do one pass along the sorted timestamps appearing within that range and apply last_value
to ones that have already elapsed according to their naive/unzoned timestamp.
I sketched out this starting point for identifying the window of ambiguity:
ranges = TimeZones.transition_range.(datetimes, tz, Local)
transitions = findall(length.(unique.(ranges)) .> 1)
fallback_window = first(miss):last(miss)
Outside fallback_window
, we can apply ZonedDateTime
to datetimes
. Inside it is where special logic is needed. I've tested a couple of simple approaches, but don't have something that generalizes well enough yet.
Broadcasting
ZonedDateTime
correctly converts a time series of naiveDateTime
s, unless the time zone is variable and there is a "fall back" in the time series.The optional arguments for resolving this all work correctly for single
DateTime
s, but any single argument selection will give the wrong result for at least one timestamp when broadcast over a sorted array that crosses a fall back.There doesn't seem to be a solution for this in
TimeZones
yet, unless I'm missing something. It seems like for cases where broadcasting doesn't work -- which I think are probably pretty common for users ofTimeZones
, they certainly are for me -- it would be useful to have a method that can handle arrays of sortedDateTime
s. Maybe something like:ZonedDateTime(datetimes::Array{DateTime,1}, tz::VariableTimeZone)
For a sorted 1D array of
DateTimes
crossing a fall back, there is enough information in the time series to resolve the ambiguity.I'd be willing to contribute to a PR on this if folks agree it would be useful.