MycroftAI / lingua-franca

Mycroft's multilingual text parsing and formatting library
Apache License 2.0
75 stars 79 forks source link

Should we require that all datetime params are tz aware #183

Closed krisgesling closed 3 years ago

krisgesling commented 3 years ago

Context: arising from #180 - I raised these points there, but it seems like a distinct question for us to ponder independent of that PR.

Currently there is no requirement for datetime object parameters to have timezone information included (be timezone aware). To put it another way, they can be tz naive.

This can be compensated for in a few ways:

I can see people making strong arguments in both directions however we get to some problems eg nice_duration_dt(dt1, dt2) - if one datetime object is aware and the other naive, do I expect the naive object to be my default tz or the same as the other object?

For me this suggest that the best option would be to require all datetime objects passed in must be tz aware and throw an exception if they aren't. Enforcing the use of tz aware datetime objects reduces the chance of perceived bugs by warning downstream developers if they are using tz naive datetime objects. It also reduces the amount of datetime modifications we are doing. This puts some additional burden on the developer, but also ensures they get back exactly what they should expect.

This doesn't necessarily negate the need for a default tz config as there could still be an option to have everything returned in the local tz. It would mean the only method that must have tz info applied to it is extract_datetime.

JarbasAl commented 3 years ago

there is no need to require date to be tz aware imho, they can be when the user really means a different timezone, but otherwise #180 ensures that internally they are tz aware, no need to put the burden on users or to break backwards compatibility, passing a tz naive datetime is valid syntax

we just need to set expectations, 99.9% of the time the user means "my timezone", i prefer to make it clear tz naive dates assume the local tz than to break everything in the wild and blame the user, "you should have passed a timezone, not my bug!" is my least favorite solution for all this timezone stuff

in regards to return types having a tz attached changes almost nothing (might break date comparisons at most), and that can indeed be considered user fault as long as we set expectations right, in this case the expectation is that returned dates always have a timezone and users can depend on that. if until now it was undefined behavior with #180 it becomes clearly defined

i dont think the assigned tz should depend on the "other" argument either, no tz means default_tz, simple and clear

from a downstream perspective, the total amount of boiler plate this will remove alone makes it worth it, imagine every single usage of LF in the wild needing to change the dates they are passing into the callers, just to do a boring chore that can be handled transparently

ChanceNCounter commented 3 years ago

The trouble, I think, and the reason there are arguments in either direction, is this:

extract_duration(<today at 0900>, <today at 2000 UTC>)

Should the first datetime be forced into UTC, yielding 11 hours? Or should it be forced into my local time zone, yielding 4 hours? What about when it's a "third" TZ? There's no solution here. You just gotta pick one and that's how LF works.

However, I agree that making downstream pass timezones is too much. Experienced programmers hate working with TZs, and Mycroft is meant to be accessible to brand noobs. I think we should run with #180, and document that "all timezone-naive datetimes will be treated as if they were in Lingua Franca's default configured time zone."

JarbasAl commented 3 years ago

The trouble, I think, and the reason there are arguments in either direction, is this:

extract_duration(<today at 0900>, <today at 2000 UTC>)

Should the first datetime be forced into UTC, yielding 11 hours? Or should it be forced into my local time zone, yielding 4 hours? What about when it's a "third" TZ? There's no solution here. You just gotta pick one and that's how LF works.

However, I agree that making downstream pass timezones is too much. Experienced programmers hate working with TZs, and Mycroft is meant to be accessible to brand noobs. I think we should run with #180, and document that "all timezone-naive datetimes will be treated as if they were in Lingua Franca's default configured time zone."

for your example i think the most correct thing to do is to add the user configured timezone for the comparison, i dont think the second argument should influence the first at all

note that comparing durations with datetimes in different timezones is not an issue in itself at all, but both dates need to have some timezone, if one of 2 args was passed without timezone we should not assume UTC, but rather the default one, 99% of times either both dates will have a tz or both will be naive, so the impact really is minimal

mixing tzs is a perfectly valid operation and nearly impossible to do by accident

krisgesling commented 3 years ago

I just realised that the to_local method is not acting the way I expected - same in mycroft-core.

https://github.com/MycroftAI/lingua-franca/blob/master/lingua_franca/time.py#L70-L82

Currently if no tzinfo is on the object, it adds UTC then converts it to the local timezone... ie a tz naive datetime object is currently assumed to be UTC.

This again highlights to me how differently developers think about tz naive objects. I know I sound like a broken record right now, but I really think we should be opinionated on this and require downstream projects to pass in tz aware objects. Dealing with dates and times is painful because we don't use a standard.

To fix to_local (if we consider it to be "broken"), it will be a breaking change either way. Changing the method to use the default_tz instead of UTC quietly changes the behaviour of any downstream package - maybe fixes things, maybe breaks things. Throwing a warning or exception to say you should change something gets in peoples faces, maybe causes them to do some more work, but in the end means their code actually works.

JarbasAl commented 3 years ago

as an end user, i will simply stop using LF if i need to handle user locale every single time i call a function, i can do it once at the top level module and whenever needed, but if i have to do it every single function call i won't be bothered

if you can require people to pass tz-aware dates and break everything in the wild, to_local is the lesser problem since it's essentially never used anyways...

i also disagree that to_local should just assume UTC for naive datetimes, but thats properly documented in the docstrs so i can live with that,

i would maybe add a deprecation warning and remove the to_XXX methods if they aren't used at all. in the end they are only there for backwards compat in case someone is importing them, if we disagree with the UTC default we can alternatively only deprecate that piece of those methods

ChanceNCounter commented 3 years ago

i also disagree that to_local should just assume UTC for naive datetimes, but thats properly documented in the docstrs so i can live with that

🤷 the functions return each others' input. If ya gotta make a call, that's a decent one to make.

JarbasAl commented 3 years ago

if there is no default_tz assuming UTC is a decent (but still arbitrary) call to make, but if we introduce a default_tz then assuming UTC no longer makes sense

in the end this discussion is not really about if LF should handle timezones (it already does), its about if it should assume UTC or allow configuring the default timezone.

ChanceNCounter commented 3 years ago

Oh, I absolutely think it should allow configuring the default timezone, I just figure that's now_local().

I figure, most of the time, if you're trying to to_local() a datetime, you're expecting that datetime to be in another time zone. If you then pass a TZ-naive datetime to that param, well, what's the best default non-local time zone?

If they're working right, shouldn't to_local(to_utc(now_local())) == now_local()?