Open botahamec opened 2 years ago
Thanks for adding this @botahamec - and thanks @gerritsangel for the original comments - I think this is a great idea.
As you mention this will be tricky without a breaking change, so its probably a 0.5
change, one option is we could change the LocalResult
to a struct that keeps the original naive timestamp, and internally stores something like the current LocalResult
. From here we could build multiple different ways to extract a DateTime
potentially including the methodology listed above.
pub struct LocalResult<T> {
naive_local: NaiveDateTime,
zoned: PotentiallyAmbiguous<T>,
}
enum PotentiallyAmbiguous<T> {
None,
Single(T),
Ambiguous(T, T),
}
Why is this tricky without a breaking change? Why couldn't we add methods to today's definition of LocalResult
?
@djc I think we would need to modify the TimeZone
interface in order to support getting earlier and later valid times for the timezone. Something like
LocalResult::None
, because 2:30 is skipped that day (it just so happens you chose a date where DST changes)TimeZone
interface figures out when that would be@djc - because in the None
case of the current LocalResult
we have lost the original local timestamp, which we need to complete a lossy conversion to a DateTime<Tz>
@botahamec - This is an interesting approach, we could leave LocalResult
as it is and do the work by adding more functions to the TimeZone
trait, essentially have a from_local_datetime_lossy
which returns a DateTime<Tz>
rather than a LocalResult<DateTime<Tz>>
. If we can give this a default implementation then it also isn't a breaking change, so this is a pretty good reason to do this on the TimeZone
trait. We could actually build this with what we have now just by using some heuristics like adding and subtrating a few hours to get the offsets either side of the given NaiveDateTime
in the LocalResult::None
case (figuring out a good choice of hours here would be important to capture unusual transitions)
@esheppa Interesting. I think we should try to do two methods though. One to find the next valid time, and one to find the previous valid time. I think to find we'd try to add an hour until we find a valid time, and then do a funky binary search to figure out the exact time we need. For most time zones, that should be fine because we're just worried about DST. But sometimes timezones will skip ahead by entire months, or someone will write a dummy time zone which always returns None (we'll also need to handle the case where the next valid time is ambiguous, for some reason). The binary search also doesn't work for a time zone where only the 10th and 30th seconds of each hour are valid.
Maybe if the next hour doesn't work, we try the next day, then the next month, then the next year, then 100 years, then we do the maximum possible date time. That won't work for a time zone where the first second of every hour is skipped.
The more I think about it, the harder it is for me to imagine that we could find a good default implementation.
This is biting me too. I want to schedule something at time X in a given day. However it needs to be in the users time zone and I need to convert to UTC for the actual scheduling. It works fine in most cases when there is no ambiguity everything is simple, when the time occurs twice I just pick the first one (good enough for my use case). However right now if the "correct" time occurs during a jump I can't find any way to get the right time. For my purposes I would want to schedule at the jump, for example if the cock jumps from 01:59 -> 03:00 I want to run at 03:00. However I can't find any way to get this timestamp with chrono. Right now the best solution I have found is just adding a small amount of time to the date and keep trying until I get a result but this isn't great and doesn't give the best result.
@kevincox For what it's worth, I think in that specific use case, you should probably create an error message saying that time doesn't exist on that day
Now that I think about it, I'm not quite sure what I would consider to be a good use case for this API
For my use case this is a repeating event and users would rather that it occurs on this day at the closest correct time instead of erroring out and missing a day.
@kevincox Ah, that makes sense. I can imagine that causing problems too, but that's probably a fair compromise.
Yeah, it isn't perfect for every use case but it works very well for my current problem.
@kevincox - if you are using chrono-tz
then can solve most cases by using user_tz.offset().base_utc_offset()
and user_tz.offset().dst_offset()
to get the possible offsets to more easily calculate the closest valid local time, this is not perfect as it won't capture less common timezone changes but you can fall back to your existing method in cases when this doesn't work.
That being said, in 0.5
we will try to find a more ergonomic way to to this, as well as some helper functions for the most common closest-valid-time calculations.
Hi @kevincox - on a further review of the code I think my previous answer may be wrong. It looks like dst_offset
returns 0 when not in DST and the increment between the base_utc_offset
and the total DST offset when in DST. You could try instead skipping forward to the next whole hour. This should work in many cases, however AFAIK there are some timezones which regularly skip forwards more than one hour, so those could be solved by continuing to skip forward by whole hours until you reach a valid time.
This is something I was just about to create an issue for. Lossy conversion is good enough for me, but there isn't really a way to do it at all now.
I'm glad that this is considered a blocking issue for v0.5! This has caused some small trouble recently as daylight savings is approaching. To share some more thoughts and maybe spark some discussion:
02:59.999…
. So the best time in general will be the time after the jump. However it may be useful to give a time before the jump for doing math operations (such as figuring out how far the jump is by doing local_result.after_jump() - local_result.before_jump()
.In summary I see the following information as being useful:
pub enum LocalResult<T> {
None,
Single(T),
Ambiguous(T, T),
}
I think the obvious flaw here is that in the case of forward jumps we have no information. Something like None{instant_after: T, second_before: T}
may work. Although I'm not sure if it is better to store the second before or some other form of it like the jump duration in logical time. (Would usually be +/- 1h for DST). Or maybe the duration isn't required as IIUC you can subtract one second from the instant after time and the zone aware datetime will do the jump, you can then covert to UTC and compare. In that case we would just need None(T)
in which case it is sort of the same as Single
but warning that it doesn't exactly "match" the time you asked for (but is sort of the same instant). Also should we rename it to something like InGap
, EquivelentTime
, Unrepresentable
, Lossy
or something? I can't think of a name that I like more than None
even if I don't love None
.
To integrate this new data into the helper set I took a look at what we have and what could be changed.
earliest
and latest
I think that these should probably stay unchanged. Two thoughts are below but I think it is best to stay explicit and avoid changing the behaviour of existing code.
These could return the instant after the jump on the basis that it is an equal time. This would make these methods infallible But I think it is best to be explicit.
Alternatively earliest
could return the time before a forward jump as well but since that exact instant can't be represented it is better to handle that in a different method.
single
I think this method should stay unchanged. It is is for unwrapping the simple case. It could arguably return the instant after a forward jump but probably shouldn't for the reasons mentioned above.
lossy_earliest
and lossy_latest
Similar to earliest()
and latest()
except in the case of a forward jump they return the instant after the jump. This means that they can't fail, they always return a T
.
I don't love the name but can't think of anything better right now. They are "lossy" since you can't go back to the original (invalid) local time. Going back will return the end of the jump. Other names I thought of were earliest_valid
as it returns the nearest time that is valid.
lossy_single
Same as single()
except in the case of a forward jump it returns the instant after the jump. Still returns None
in the case of Ambiguous
collapse
I don't think this is the best interface but it is an interesting idea. Instead of adding lossy_*
we can just add one method collapse
that returns a type without None
. Then ValidLocalResult
can have earliest
and latest
which are infallible.
fn collapse(self) -> ValidLocalResult {
match self {
LocalResult::None(t) => ValidLocalResult::Single(t),
LocalResult::Single(t) => ValidLocalResult::Single(t),
LocalResult::Ambiguous(e, l) => ValidLocalResult::Ambiguous(e, l),
}
}
Overall I don't think it makes sense to have a separate type for this, just to avoid lossy_earliest
and lossy_latest
. But it is an interesting idea.
My current thought on the desired changes are below. But I'm not a datetime expert so would appreciate input. Notably I am assuming that the only reason that the time can be invalid is a forward jump. Are there other cases such as when near the ends of representable time? In that case we would need an extra OutOfRange
variant possibly with a nearest time parameter.
None
variant to include the instant after forward jumps.lossy_*
methods which are equivalent to existing methods but treat None(T)
the same as Single(T)
.~My suggestion is to mimic the behavior Python takes in PEP 495, which is to change None
to instead be Gap(T, T)
or Invalid(T, T)
, where the earliest and latest times are the ones where the pre-transition and post-transition rules are in effect respectively.~ I see that this discussion has been picked up in https://github.com/chronotope/chrono/issues/1448 instead.
From #687
I like what we have already (the
earliest
andlatest
methods might just be as far as we want to go), but I wanted to see what people think about this. I don't think there's anything we could do without a breaking change. One idea is to have the implementor of theTimeZone
interface provide a later or earlier time that would work.