crystal-lang / crystal

The Crystal Programming Language
https://crystal-lang.org
Apache License 2.0
19.32k stars 1.61k forks source link

Time + Time::MonthSpan inaccurate around DST change #6522

Closed HCLarsen closed 2 years ago

HCLarsen commented 6 years ago

When adding or subtracting a Span or MonthSpan to a Time object that isn't UTC, the result is off by an hour. It appears that it's adding a specific number of seconds instead of days, weeks, months or years, as you would expect.

time = Time.utc(1997, 10, 21, 9, 0, 0)
time + 1.weeks  #=> 1997-10-28 09:00:00.0 UTC

time = Time.new(1997, 10, 21, 9, 0, 0)
time + 1.weeks  #=> 1997-10-28 08:00:00.0 -05:00 Local

time = Time.new(1997, 10, 25, 9, 0, 0, location: Time::Location.load("America/Toronto"))
time + 1.days  #=> 1997-10-26 08:00:00.0 -05:00 Local
asterite commented 6 years ago

I won't comment on whether this is good or not (it does look good to me, though), but yes, time math goes through seconds, always.

asterite commented 6 years ago

Actually, for months it doesn't go through seconds. So:

time = Time.new(1997, 10, 25, 9, 0, 0, location: Time::Location.load("America/Toronto"))
p time           # => 1997-10-25 09:00:00.0 -04:00 America/Toronto
p time + 1.month # => 1997-11-25 09:00:00.0 -05:00 America/Toronto

So yes, one of the behaviors should maybe be fixed, though I can imagine it being really hard to implement, specially when right now you can combine units (except months).

HCLarsen commented 6 years ago

Think about implementation though. When you're thinking of 1 week from today, you expect that to be the same hour, not an hour earlier because you set your clocks forward in the meantime.

HCLarsen commented 6 years ago

The workaround I implemented in my project is only 7 lines long. I used a comparison of DST on the receiver and the argument.

asterite commented 6 years ago

I don't know. Let's say I say "in one hour" and dst starts having effect. Then your 7 lines diff will change the hour again. Then in real life, an hour wouldn't have passed, but maybe zero or two.

Maybe with days and weeks it's different, because they are usually tied to a date.

How does Ruby's activrsupport behave regarding this?

HCLarsen commented 6 years ago

No, my workaround doesn't change the hour if the span is 1.hour.

I've been trying to figure out how Ruby's Active Support does this, but I'm not sure yet.

HCLarsen commented 6 years ago

Active Support does exactly what one would expect. A week after 9am on October 21st is 9am on October 28th.

HCLarsen commented 6 years ago

HOW they do it is more interesting. They actually construct and return a new time object constructed from the values of the old object, substituting day and week values when new values are provided.

straight-shoota commented 6 years ago

Yes, this problem has already been mentioned before (see #4556), but I don't think there has been a dedicated issue yet.

Time libraries usually solve this by having two different concepts for time span, one accurate, based on elapsed time (like Time::Span) and the other nominal based on calendrical units (like Time::MonthSpan but with higher granularity).

Some however only have one, time-based type (for example Rusts chrono Duration, Golang Duration). This obviously makes it hard to use calendrical calculations.

I guess it could also be an option to have one type with individual fields for (nano)seconds, hours, days, months, years and it can be used both ways depending on how you populate the fields (either 1 day or 24 hours). But that's probably going to be more confusing than anything else because you never know if an instance is meant to be time-based or calendrical.

They actually construct and return a new time object constructed from the values of the old object, substituting day and week values when new values are provided.

That's actually the most sane way to do this. Because it is a calendrical concept, it needs to go through the calendrical interface. That's much easier than figuring out the exact amount clock shifts to calculate an offset of elapsed time.

I would suggest to leave Time::Span as is (but maybe rename it to Time::Duration to be more concise) and add an additional Time::Period type, probably with properties nanoseconds, hours, days, months, years. The details need to be discussed, though.

HCLarsen commented 6 years ago

Well, it's an important discussion to have, to be certain. I suspect that many will get tripped up by this before it's fixed. I'm more than happy to help in any way I can.

I don't think Time::Period is the best naming, it doesn't read very well as English. Perhaps Time::DaySpan to be analogous to the MonthSpan that already works in the way you'd expect it.

As asterite pointed out, when people are talking about hours, they're going to mean the specific hours. Same goes for smaller values. This behaviour only really comes into play when people are talking about intervals of days or weeks.

asterite commented 6 years ago

The problem comes with, for example, expiration dates. You can say something expires in 3.days, for example a cookie or a session, and with that you mean 72 hours. The problem is that days is ambiguous.

Maybe we should remove those convenience methods from the standard library, just like in Go. Alternatively we can document it always refers to computed seconds.

HCLarsen commented 6 years ago

The thing about that is, 3.days should be treated differently than 72.hours. If you want to describe a span of 72 hours specifically, you're going to say 72.hours, not 3.days. On the other hand, if you want an event to repeat every 3 days, you want it happening at the same time every day. Your work schedule isn't going to shift by an hour just because of daylight savings time.

I vote against removing the convenience methods. I think they're brilliant, and very useful.

How about, expanding on straight-shoota's idea, we leave span the way it is, and add another struct for date spans. I think DateSpan or Interval would be good names for it. While the Span still operates on seconds and milliseconds, this new object would hold 3 values: seconds, days and months. That way, addition and subtraction can work in the ActiveSupport way, with days being added evenly, regardless of DST, and months can be added without worrying about leap years.

straight-shoota commented 6 years ago

The behaviour and existence of the convenience methods is actually a subordinate issue.

Foremost we need to discuss how to express the different semantics in terms of data structures.

asterite commented 6 years ago

Yeah, making days and week return a different struct, similar to MonthSpan, sounds good. Or even representing all time spans as a combination of seconds, days and months, as suggested (I don't mind it will be a bit larger, it's not like time spans are stored and used as much as time instances).

straight-shoota commented 6 years ago

it's not like time spans are stored and used as much as time instances

That depends. If you're doing a lot of time measurements, it might be quite important. In my opinion it doesn't make much sense to merge both concepts into one type. The size is one issue, but more importantly, there should be a clear distinction between a time duration (i.e. what's returned by Time.measure) and an abstract time period.

asterite commented 6 years ago

@straight-shoota How would you distinguish them when constructing them?

straight-shoota commented 6 years ago

Time::Duration.new(days: 5) vs Time::Period.new(days: 5). Let's just leave the convenience methods out for now and figure that out later.

asterite commented 6 years ago

What's wrong with treating days, weeks and months as such, instead of a fixed amount of seconds?

straight-shoota commented 6 years ago

Not sure I understand you question... there's nothing wrong with that, I actually propose adding a type for treating them individually.

asterite commented 6 years ago

I mean, 1.day would mean period, 1.hour would mean duration. Those are the constructors. We just need to fix the semantic.

HCLarsen commented 6 years ago

I know it's just semantic, but I really don't like the name Period. Elixer's Interval name sounds far better to me.

HCLarsen commented 6 years ago

Here are my thoughts on how it would work:

span = Time::Span.new(72, 0, 0)
interval = Time::Interval.new(0, 0, 3, 0, 0, 0)  #=> 3 days

time + span       #=> 2018-11-06 08:00:00.0 -05:00 Local
time + interval   #=> 2018-11-06 09:00:00.0 -05:00 Local
asterite commented 6 years ago

Having Period, Duration, Interval, or any combination of that is bad, because the names are interchangeable and almost mean the same, so it'll be confusing as hell.

I propose what you, @HCLarsen, proposed: a Time::Span should consist of:

Then there's no confusion at all. The only confusion might arise if someone does 3.days expecting that to be exactly 72 hours. For that case, we should document that one must use 72.hours and not 3.days, and that's it. I prefer that small confusion that can be explained in the docs over having two or more new types, which behave slightly different from one another, and where it's not clear what name belongs to which behavior.

HCLarsen commented 6 years ago

a Time::Span should consist of:

Actually, my idea was that the Interval class would consist of that, and the Span would remain as is, just seconds and milliseconds. Although, I recognize that this idea has validity as well.

asterite commented 6 years ago

The problem with having two types is that one can no longer do: 1.day. Is it one type or the other? So we must use the more verbose Time::Interval.new(...) or Time::Period.new(...). Or we could have 1.day(period: true) or something like that, but it's still more complex than just having 1.day as the only possibility with only one possible meaning.

Maybe those 2.days convenience methods are not that important and we could drop them, I don't know. I like them :-)

HCLarsen commented 6 years ago

Crystal seems to handle that just fine now with the Span and MonthSpan types. 1.months returns a MonthSpan, and 31.days returns a Span. In my new idea, it would work like this:

72.hours  #=> 3.00:00:00 Time::Span
3.days    #=> 3.00:00:00 Time::Interval

Technically, the same time period, but different methods return different types.

asterite commented 6 years ago

Sure, that could work.

The only inconvenience is that right now you can do:

# You can add spans
span = 1.hour + 5.seconds

# However, you can't add a span with a month span
span = 1.month + 3.hours # => doesn't compile

With just a single type holding all this information, it could work. And then if you have anything in the "days" or "months" fields, you can treat "hours" as fixed hours, so "1.day + 3.hours" after 9am would always give the next day at 12am, regardless of DST. Or maybe these fields shouldn't be combined to avoid this new confusion.

HCLarsen commented 6 years ago

You could still do that if you overloaded the + method.

struct Time::Interval
  def +(span : Time::Span) : Time::Interval
    @seconds + span.total_seconds.to_i
  end
end
straight-shoota commented 6 years ago

Please let's not worry about the utility constructors right now. This is one step ahead. They're certainly nice and I'd like to find a way to keep them with improved semantics.

@asterite Yes, having two different types with similar names can be confusing. But maybe we can find names that clearly express purpose and avoid that.

Having only one type is very much confusing, because then there's no type safety. If you have two instances, there is no way to tell by the type if they're meant to represent elapsed time or calendrical units. I'd expect this could be a source of even more confusion when applying mathematical operations. In many applications, such as benchmarking, time measurements, timeouts etc. you really just need to handle elapsed time. If the same type could also have days, month and year fields, all these use cases would have to make sure to properly handle these as well, or raise an error (but that could only happen at runtime).

I haven't found a single instance of a multi-purpose time span type in any time library. I have seen they either have two separate types or only support elapsed-time.

asterite commented 6 years ago

What about activesupport? According to OP it seems io be doing what he wants, and you only create them with the utility methods.

RX14 commented 6 years ago

Can we just look at what go provides and leave the rest of the time complexity to a shard?

RX14 commented 6 years ago

I've looked at what go provides and Go just has time.Add(time.Hours * 24) or time.AddDate(0, 0, 1) to add a day. The former returns 24 hours and the latter returns 25.

We don't need a whole new datatype for this, just a new method for adding days, weeks, and years. For the record though, go doesn't provide a time.Weeks or a time.Months, so it effectively segments the API between "hours" and shorter being a length in seconds and "days" and fewer being calendar-based.

RX14 commented 6 years ago

That being said it'd be a lot more crystally to have a new datatype, for a TimeSpan and a CalendarSpan, the former holding seconds and the latter holding a days, months, and years integer offset.

We already have a MonthSpan, we just need to rename it and add year and day offsets.

straight-shoota commented 6 years ago

@asterite Yeah, fair enough, ActiveSupport::Duration could be considered as combining both purposes. Yet the plain Ruby Time class still expresses time durations as simple numeric values. Measuring time by subtracting instances does not return Duration:

Time.now - Time.now # => -1.07e-05

But, obviously ActiveSupport::Duration can express a duration based on elapsed time as well as calendrical. That makes it different from Duration/Period in the Java Time API for example, where the former is strictly time based (stores only (nanos)seconds) and the latter strictly date-based (stores days, months and years).

Having a data type combine both concepts can effectively implement a duration as specified in ISO 8601 (5.5.4.2 Representation of time-interval by duration only).

@RX14 It would certainly be an option to only include the current duration-based Time::Span in stdlib. And leave anything beyond that for a shard. That's essentially the status quo. And there is noting wrong with it, per se. Maybe add something like Go's time.AddDate or ActiveSupport's Time#advance which would be a place for a shard to hook in. Then we should probably remove the convenience methods to let them be defined by a shard as well.

On the other hand, I'd really like to extend Crystal's Time API to support calendrical time spans. It's nice to have this included by default as it is very commonly used for time-based calculations in all kinds of applications. Having a solid implementation in stdlib would be great as people can just just use it and can have the nice convenience methods like 5.days readily available.

HCLarsen commented 6 years ago

I do like the idea of having two. Leaving Time::Span the way it is, for nanosecond-accurate passages of time, whereas the new struct would handle calendar date passages of time.

I've been working on a rough draft of my idea, so let me know what you think:

https://github.com/HCLarsen/time_ext

straight-shoota commented 6 years ago

Just to know what we're talking about, I compiled a list of time units supported by the Time API and their relations:

Without relying on general-but-not-necessarily-every-time relations this boils down to a few base units which can be used to represent others as well. With these, you can describe calendarical time spans and don't lose accuracy because of relying on implicit semantics:

A different issue are fractions of a unit. Should they be supported at all? ActiveRecord::Duration for example does. And if yes, how are they to be interpreted. For example, what does 1.5 days or 0.3 months mean?

HCLarsen commented 6 years ago

A few thoughts on this.

First, if someone is looking at calendrical time spans, will they be concerned with nanoseconds? It seems to me that an application where someone is looking at that much accuracy in the time that's passed isn't the same application where you're concerned with calendar accuracy. That's why I opted for seconds instead of nanoseconds.

I'm very hesitant to add weeks as a stored value. This would get VERY messy. Every other value, even the ones not directly stored, fit together well. Weeks are the oddball that are 7 days, but don't fit into months evenly. That's likely the reason why the ISO8601 duration value allows for either date-time or week representation, but not both together.

The hours thing seems like an extreme edge case. Should we add another stored value for this kind of extreme edge case?

straight-shoota commented 6 years ago

First, if someone is looking at calendrical time spans, will they be concerned with nanoseconds?

Probably not. But I see no reason to set an artificial limit. Bot units are just a multiplier of 1_000_000_000 apart. In some cases, higher resolution than second might actually be necessary and then you should go the entire way instead of going to milliseconds. Plus, it seems pretty logical to use nanoseconds as base unit in all time related types. The only reason that might speak against it is that it requires an 8 byte value to express seconds as nanoseconds. But the API for accessing seconds would still be Int32 probably. I wouldn't say it is definitely required, but up for discussion. I just wouldn't simply discard nanoseconds as "nobody needs that".

I agree that having a duration type that stores different units of time can be a bit confusing because the order of application can change the output. But that is true for any unit, not just weeks. I don't see any reason why weeks would be different in any form from the other units. And it is very common to express time durations in numbers of weeks. But of course, it is an absolutely rare edge case that a week does not equal 7 days. Still, I think a somewhat respectable implementation should cover that.

The same goes for hours. It is a part of the typical calendrical representation of a time instance and due to the influence of time zone offsets, the elapsed time between to instances an hour apart is not intrinsically equal to 60 seconds. Such a case is pretty rare, but a real thing, so I think it should be covered.

And the thing with time zone changes is, that they can come pretty fast with short notice. Countries can pretty much do what they want. There might be a relevant change coming up sooner than anyone might think (and faster than anyone could patch) and change to some weird offset. Who knows... politics is complicated.

straight-shoota commented 6 years ago

ActiveRecord::Duration btw. supports years, months, weeks, days, hours, minutes, seconds. This is perhaps actually the best thing (plus maybe nanoseconds) because it covers all relevant time units. Even if months and years are strictly proportional related (at least I can't think of any exception) and the potential incongruity between minutes and seconds doesn't necessarily need to bother. This form of completeness certainly has something to it. I think that's what I would prefer because it's most explicit.

HCLarsen commented 6 years ago

I agree that having a duration type that stores different units of time can be a bit confusing because the order of application can change the output.

I never said that. All along I've said that it's necessary to store different units of time, and my example makes use of that, storing seconds, days and months explicitly.

What I'm saying is that storing weeks is a bad idea, because it doesn't fit evenly into the scale. Every other unit of time fits evenly into the next larger unit, without any remainder. Weeks only fit evenly into February. The rest of the months, it's a partial week. That will make a complete mess of things. That's why I'm extremely loathe to include weeks as a stored unit.

asterite commented 6 years ago

Isn't a week just 7 days? Or what's the problem with seeing it like that?

HCLarsen commented 6 years ago

Technically, a week is the next time the same day of the week arrives. The mass majority of the time, that's 7 days, although @straight-shoota did demonstrate an example where those two weren't equal.

That being said, I'm far more in favour of looking at weeks as just 7 days.

straight-shoota commented 6 years ago

@asterite see https://github.com/crystal-lang/crystal/issues/6522#issuecomment-412666986

example:

samoa = Time::Location.load("Pacific/Apia")

a = Time.new(2011, 12, 25, 0, 0, 0, location: samoa)
b = Time.new(2012,  1,  1, 0, 0, 0, location: samoa)

a.day_of_week # => Sunday
b.day_of_week # => Sunday
(b - a).days  # => 6

@HCLarsen

That will make a complete mess of things.

How so? I don't see why this would cause any more trouble than any other unit.

ActiveRecord::Duration works with weeks.

HCLarsen commented 6 years ago

How so? I don't see why this would cause any more trouble than any other unit.

What I said in the very same paragraph that you quoted:

The rest of the months, it's a partial week.

1 year = 12 months 1 day = 24 hours 1 hour = 60 minutes 1 minute = 60 seconds 1 month = 28, 29, 30 or 31 days

All solid integers. Meanwhile, month to weeks...

1 month = 4.428571428571429 weeks

straight-shoota commented 6 years ago

Yeah, I get that the ratio between month and week is fractional. But why do you think this is a problem?

Besides, these integer multipliers only apply as long as there is no change in the time zone offset. For example, at every clock change for daylight savings, the elapsed time of a month is actually 31.04 or 30.96 days (assuming a switch of one hour back and forth in October and March).

HCLarsen commented 6 years ago

The code example I showed you don't use number of days to calculate the passage of a month.

There's no initializer for Time with a weeks parameter, so you can't directly add a weeks value the way you do with all the other values.

Further, how do you deal with overflow? When adding anything else, you can "carry" it over to the next largest unit of measure. Because the relationship between weeks and months is fractional, you can't add it to an integer without losing information.

RX14 commented 6 years ago

We shouldn't bother with weekspans, just have a Time::CalendarSpan storing integer years months days and that's it. If you want to add calendar days and hours at the same time - and thats rare - just add twice: Time.now + 5.days + 5.hours. Keep it simple then wait for someone to come up with a usecase for more. We shouldn't attempt to cover 100% of the Time complexity inside the stdlib, but we already have prescedent with MonthSpan so this is a fine, small, addition.

HCLarsen commented 6 years ago

@RX14 that's not a bad idea. Although the ISO8601 spec does allow years, months, days, hours, minutes and seconds, it would be easy enough to just do it as suggested above.

@straight-shoota one thing to note about ActiveSupport::Duration. It does store a weeks value, but when performing addition and subtraction, it converts the weeks to 7 days, and adds that to the days value. It seems that it wouldn't compensate for that Samoa edge case that you mentioned.

straight-shoota commented 6 years ago

@RX14 Yeah, that's exactly like Duration and Period in the Java Date Time API.

@HCLarsen Weeks are not an integral part of the regular calendar format, but calendar weeks are still common enough in business applications. Right now, Time doesn't support that at all, but it should be added at some point.

HCLarsen commented 6 years ago

The thing is, weeks will work differently than anything else. Every other time unit corresponds directly to one of the arguments for Time.new. The only way to calculate a proper week is to look for the next time the same day of the week occurs.