crystal-lang / crystal

The Crystal Programming Language
https://crystal-lang.org
Apache License 2.0
19.42k stars 1.62k forks source link

Having a Great Time #4556

Closed straight-shoota closed 6 years ago

straight-shoota commented 7 years ago

Time in stdlib is found to be lacking in different ways and there have been some mentions of possible improvements

I've done some research on time and date solutions in other languages and as far as I can tell, the Joda Time based Date-Time API in Java 8 looks like a good source of inspiration.

A few key concepts (as far as I understand it):

I'm not familiar enough with Crystal internals and don't know if and how these ideas could be useful in Crystal, but this could be a good place to discuss plans for improvements of Crystal's time API.

Other notable resources (other items might be appended):

akzhan commented 7 years ago

Theoretically Time spans should have xxseconds and months. It cover all known (for me) cases.

See http://www.w3resource.com/sql/data-type.php#INTERVAL for details.

straight-shoota commented 7 years ago

It won't cover adding a conceptual day over switching of DST.

Say you have a datetime on the last day of DST and want to add the timespan of one conceptual day, this timespan does not equal to 60.seconds * 60 * 24.

RX14 commented 7 years ago

I think this is overcomplicating things, we should have a single Time class which represents a point in time and it should continue to use ticks since some epoch as before. Using seperate fields for each part of the gregorian calendar is wasteful of both bits and performance when manipulating times.

However nanosecond precision and storing a monotonic clock are both things that i'd like to see Time do, and I believe proper zone offsets are required before 1.0.

ysbaddaden commented 7 years ago

The goals are:

Dealing with a single ticks integer is simpler (and faster) than dealing with individual values that much each be checked for overflow on every computation for example.

Actually, having a separate Int64 (seconds) and Int32 (nanoseconds) makes computations a little more complicated (and probably slow). I wish we had support for Int128 in Crystal (I tried to add it, but it's tricky) which LLVM would optimize :)

I'm not sure about the "transparent" monotonic clock. It would require 2 system calls (1 realtime, 1 monotonic) then store these 2 values (at least 2 × long). Can the nanosecond precision of the monotonic clock really be trusted for the realtime?

asterite commented 7 years ago

@ysbaddaden I tried to add Int128 a while ago and it worked :-) I just didn't mention/commit or suggested it because I think it's not well supported in all platforms. Plus parsing of Int128 without support for Int128 is a bit tricky (but we can leave that for a later release). I can try to do it again (I lost that branch, but it was not much code) and send a PR and we can see how well it works in other platforms, if that helps with the implementation of Time.

asterite commented 7 years ago

And BTW I agree with all comments from @RX14: I'd like to keep it simple (just Time, no other types) and having monotonic clock inside Time as well.

RX14 commented 7 years ago

Can the nanosecond precision of the monotonic clock really be trusted for the realtime?

I'm not entirely sure what you mean. The monotonic clock wouldn't be used for calculations of wall time at all. I was simply suggesting that the precision of the wall clock ticks could be kept the same or reduced (not sure about that) since nanosecond precision is most likely used for calculating time spans instead of for a really accurate wall clock. NTP is only so accurate and few computers have atomic clocks installed so nanoseconds on the wall clock is very unlikely to actually mean anything.

Although I think maybe it's best just not to worry about the size of Time and have everything be nanosecond precision.

straight-shoota commented 7 years ago

I did not intend on suggesting a port of Java's DateTime API to Crystal. But the fact that such an fundamental part of the stdlib was introduced into a very mature language and is based on a 3rd party package that had been the de-facto standard prior to Java 8, makes it an interesting source of inspiration worth taking a look. They have undoubtedly put some thinking into it. Then again, of course, it is Java, they inherently need to overcomplicate things ;)

While I'm not particularly qualified for detailed reasoning about the internals, I have some ideas about the user-exposed API and its current shortcomings:

I would like to see a way to express different units of time, say to represent a date without time. This might require a different class (Date), or maybe there could be a granularity modifier on Time class? There are many cases where the time information is just a date (e.g. a birthday) and it makes no sense to have a precise time associated with it as this suggests meaning which is not there.

And some improvement could be made to time intervals. Currently, Time::Span is an accurate time-based duration with the highest unit of days. This is based on the assumption that 1.day == 24.hours which is not always the case and this will matter when time zones and DST are supported. Then there is Time::MonthSpan which is conceptual month-based: it adds exactly one month whether it has 28, 29, 30 or 31 days. Inside stdlib it is only used for Int#months and Int#years. The approach in Java's DateTime API seems to make more sense: Duration is similar to Time::Span but the biggest unit is hours. Period is similar to Time::MonthSpan but the smallest unit is days. I don't care about naming, but days should be considered as a concept instead of an accurate amount of 24 hours. Additionally, there should be a way to calculate a conceptual interval between two points in time, to precicesly calculate an age in days, months and years. With Time::Span this is not possible, because a day is not always 24 hours, a month can have different amounts of days and years as well.

ysbaddaden commented 7 years ago

I'm revising my argument (I was deep inside ticks and nanoseconds).

Before going further, I believe we should stick to dates representable as ISO8601, where 1 minute = 60 seconds, 1 hour = 60 minutes, and 1 day = 24 hours, with offset in minutes. We shouldn't have to deal with other formats, leap seconds (that's the system's job), or whatever (only how many days per month). Furthermore, this is how the underlying systems work.

An advantage of individual attributes, is that we can represent any kind of date (thought still limited by year integer size), and accessing individual fields doesn't need special computation. An ISO8601 date fits in a 128 bits struct, allowing -32768 to +32768 years —i.e. larger than the 1583-9999 range of ISO8601.

struct Time
  @year : Int16
  @month : Int8
  @day : Int8
  @hour : Int8
  @minute : Int8
  @second : Int8
  @nanosecond : Int32
  @offset : Int16
  @kind : Kind # Int8
end

A disadvantage is that adding or substracting times or a duration (Time::Span) or applying an offset, will require to compute each individual value, each time checking for overflow. It may not prove that complex, since Time::Span would follow the same representation, and accessing individual values would be simple.

I'm not saying we should absolutely follow this model, but that it stands valid. After all, we represent dates more than we add or substract dates, especially if we introduce monotonic clocks to measure elapsed time, which should also be used for timeouts.

I don't like merging a monotonic and wall time inside a single struct: I usually need to represent dates or measure elapsed time, not both. It may be nice to have both, like getting start and stop times, computing the duration (monotonic clock) and printing both start/stop times (wall clock), but this doesn't apply to all the other cases where we only need to represent a date (e.g. created_at, updated_at, deleted_at of database records) which would end up wasting space (and another syscall).

I'm more in favor to introduce solutions for measuring elapsed time (monotonic) or checking whether something did timeout, which is the only meaningful representation of monotonic clocks. For example:

duration = Time.measure { do_heavy_work } # => Time::Span
RX14 commented 7 years ago

@ysbaddaden I think users shouldn't have to think about whether they want a wall clock or a monotonic clock (they'll get it wrong) and crystal should do the right thing by default. Usually the Time.now calls themselves (only 2 syscalls) aren't the bottleneck. With the vDSO there shouldn't be any syscalls anyway.

straight-shoota commented 7 years ago

Programmers should not have to think about the specific implications, this is true. But they need to have the right tools for what they are doing. What is the so called right thing?

For measuring purposes, a monotonic clock is required. For everything related to presenting dates and times for/from users, we need a wall clock.

Go currently only supports real time because the language was initially intended for Googles production servers on which the wall clock never resets. On other systems this has led to major issues caused by leap seconds (including Cloudflare's DNS failure last year). For the next release, there has been a proposal to introduce monotonic time. It is quite long but an interesting read, especially for very detailed explanation of the differences between wall and monotonic time.

The approach is quite interesting: Because of Go 1 compatibility they were unable to modify the API of Go's stdlib. Instead they changed the representation of the Time struct to contain both monotonic and real time. This allows the user not to have to think about anything and just use the same API for both. The internal implementation choose which one they need. But it also makes things complicated on the inside. Interestingly, the memory footprint does not change (on 64bit) by only providing monotonic time for epoch ± 2^33 (1885 to 2157), outside this range there should be no need for it and these bits are used to represent bigger wall time values.

While this approach is feasible, it was primarily dictated by API compatibility. In Crystal this limitation does not exist. I think it oversimplifies things on the outside and make it complex on the inside that can lead to errors.

Measuring and representing dates and times are entirely different things. They should be separate concepts inside Crystals stdlib and provide the right APIs for their purpose so users will naturally take the right one at the right place.

To support @ysbaddaden argument, I don't think there is such a huge disadvantage in having individual attributes: Calculations concerning dates usually involve conceptual durations like 1 month or 2 days which needs to be represented as individual attributes anyway.

Ok, reading it again, he already mentioned that.

asterite commented 7 years ago

@straight-shoota I'm not sure Go did that because of backwards compatibility. They add more APIs, types and functions to newer Go releases, so they could have added some type for monotonic time exclusively.

However, I believe they put monotonic time inside their current Time type because it's more intuitive and users will be happier and less confused. Of course it's more work and a bit tedious for the programmer implementing that, but that's far fewer people that the ones how are going to use it. And if you dig into Ruby's source code and APIs you'll learn that Ruby is the same way: optimize for user happiness, not implementor happiness (Crystal tries to be the same).

Say I want to now how much time takes an operation. I have Time.now so I do:

time = Time.now
# Some operation...
puts Time.now - time

It would be very convenient if that would work out of the box and give a correct result. I use that code all of the time, and I believe many other programmers too (it's the most intuitive way to measure elapsed time). I wouldn't like to know later, somehow, that I should have used some other type because "details I don't really care about, I just want it to work".

So, at least I won't accept a solution that introduced a separate type other than Time. I'd like Time to behave like in Go, having both measures (I think the monotonic measure is included in Time.now calls).

straight-shoota commented 7 years ago

They did it because they wanted all existing code using the current API to measure elapsed time to work correctly without changes.

It looks like an easy and user-friendly solution, this is true. But I fear it can also cause trouble and lead to unexpected results when Time instances with monotonic time are mixed with or converted to ones without. Monotonic time will only be available on instances that were created by Time.now.

When time differences are calculated on monotonic time if it is available on both, this can produce the following paradoxon:

time1 = Time.now # time1 has both wall and mono time
time2 = Time.new(time1.ticks) # time2 has only wall time
time3 = Time.now

time1 == time2 # => true - compared using wall

time3 - time1 == time3 - time2 # => false - if wall and mono in differ in time1
# the previous line is effectively doing this:
time3.mono - time1.mono == time3.wall - time2.wall

I wouldn't want to be surprised in that way by the implicit behavior of Time. This can create subtle bugs similar to Time.now without a monotonic clock. But there, the reason is a bad choice of measuring tool. Here, internals would be responsible and no simple solution could prevent that.

Any programmer who expects accurate results from elapsed time calculations should know that Time.now (as its equivalents in other languages) is ambiguous and can lead to errors. I believe they can be entrusted the reasoning to decide such things.

What your are doing in your example could also be accomplished by a dedicated API for time measuring such as Benchmark.measure or Benchmark.realtime or a derivation of the API as I have proposed in #4497. It should probably not be called "benchmark" because this sounds to heavy for a simple time measure. Maybe Stopwatch or simply Clock are better. And it should of course be included by default without having to require it explicitly.

clock = Clock.now # this is guaranteed to use elapsed time
# Some operation
puts clock - Clock.now 
# or
puts clock.stop

If there is an easy and elegant interface that gets heavily promoted as the sacred method to measure elapsed time in Crystal, people will pick it up and do things this way rather than using Time.now for this.

akzhan commented 7 years ago

I think that @straight-shoota is right, and in Go would only be a monotonic time, if not compatibility issues. Realtime is secondary.

ysbaddaden commented 7 years ago

@asterite I'd like to avoid having a separate Clock too, yet I'd like to have nicer APIs for measuring elapsed time, so it could become:

# good & slim:
elapsed = Time.measure { do_something }

# bad & noisy
start = Time.now
do_something
elapsed = Time.now - start

Just like we have & recommend Benchmark.ips for example (warmup time, monotonic clock):

# excellent
Benchmark.ips { test }

# horribly bad
start = Time.now
100.times { test }
elapsed = start - Time.now

I'd like stdlib to provide a Timeout struct that would accept a Time::Span and check whether the duration has passed since the timeout started, or maybe a plain a Time::Measure struct, that would tell how much time has elapsed since it was initialized, helping with in-game clocks for games, for instance —casting a spell shouldn't be affected by wall clock, not even talking of long running actions across game saves. Something like:

clock = Time::Measure.new
clock.elapsed            # 00:00:01
clock.elapsed?(1.minute) # false
ysbaddaden commented 7 years ago

I have an issue when storing an offset: adding or substracting from a Time may cross a DST change (or sometimes a TZ change) thus change the offset, which we can't reflect. For example in Europe/Paris, the following time will have a +60 minutes offset (winter), even though April is summer time (+120 minutes):

Time.new(2017, 1, 1) + 90.days # => 2017-04-01 00:00:00 +01:00

Ruby does just that (keep the offset), but Rails behaves correctly (it recomputes the offset). It would seem they always transform a time to local time, or at least to the configured location (whatever Time.zone is set to). For example (again in Europe/Paris):

# ruby keeps the offset as-is:
Time.new(2017, 1, 1) + 90 * 86400       # => 2017-04-01 00:00:00 +01:00

require "active_support/all"

# rails recomputes the offset:
Time.new(2017, 1, 1) + 90.days          # => 2017-04-01 00:00:00 +02:00

# arbitrary offset -> local time
Time.parse("2017-01-01T00:00:00+02:00") # => 2017-12-31 23:00:00 +01:00

Does it look like an acceptable solution?

straight-shoota commented 7 years ago

I must say, I have become favourable of the idea to get rid of time zones as internal value altogether. As quoted from https://unix4lyfe.org/time/:

Timezones are a presentation-layer problem! Most of your code shouldn't be dealing with timezones or local time, it should be passing Unix time around.

Absolute time values could be exclusively stored in a standard time zone (=UTC) and offsets are only applied when they are converted to a human-readable representation. Time zones are only necessary to make time values better understandable for humans. Otherwise there is no distinction between the same instant expressed in different offsets. It just makes calculations harder, because you have to consider changes in time zone offsets such as switching from/to DST.

Time values in UTC are distinct, whereas certain values in other time zones can be ambiguous. For example, when using DST the clock is usually set back one hour once a year. So all instants of this hour in the specific time zone can relate to two different instances in UTC. The usual strategy is to arbitrarily convert to the later one, but this will always have some unexpected effects like (ambiguous_time - 1.hour).to_utc != ambiguous_time.to_utc - 1.hour.

Conversions from UTC are injective: If all zonal time values are immediately converted to UTC when entering Time class, representations of any time value for any time zone are assured to be unique.

For database systems it is considered good practice to store times exclusively in UTC. In most use cases there is no benefit from keeping the time zone which was used to determine the time value in the first place around for later use. An absolute time value will usually be presented in the local time zone of the user at the point of access, not at the point of creation.

For recording and displaying absolute points in time this approach should be sufficient in any means. It is quite easy to implement and should be the preferred way for this use case.

It does however create some trouble for calculations with conceptual time spans, because you cannot easily add an amount of 1.days without knowing the time zone in which to operate. For these operations, a time zone is required, otherwise a recurring daily event could suddenly jump from 8:00 CET to 9:00 CEST once DST is in effect.

Maybe it would be a good solution to store the time value (in UTC for the above reasons) alongside a time zone identifier which can be accessed for calculations when necessary.

ysbaddaden commented 7 years ago

Counter example:

Time.now.at_beggining_of year

In Europe/Paris this should return January 1st at 00:00 with +01:00 offset expressed as local time, not January first at 01:00 with +01:00 offset because we computed from UTC then converted to local time.

straight-shoota commented 7 years ago

This it what I was talking about with respect to conceptual time spans. Your example essentially subtracts the time spans time_of_day which refers to the beginning of the conceptual unit day and day_of_year which is conceptual in itself.

ysbaddaden commented 7 years ago

Yes, recording and later representation of time will likely be different so we should record time expressed as UTC (e.g. UNIX timestamp).

Yet time instances seem different, they operate on time (addition, substraction, arbitrary change) and it's always conceptual, as you perfectly explained, and relative to the user's representation of time (i.e. its location). Hence why I'm more keen to always transform to local time.

straight-shoota commented 7 years ago

Okay, I think we can agree, that an (absolute) time value should always have a reference to a local time zone. The question is, should internal representation be in a standard zone (UTC) and the local zone would only be applied when necessary. Or it could be stored as local time and the time zone is removed only when an operation requires a conversion. While most operations will probably be in respect to the attached time zone, there are two advantages for storing in a standardized zone: Times are directly comparable independent of zones based on their binary value. And, as stated before, conversions from UTC should always be distinct - but not the other way around. This way, handling of ambiguous times could mostly be isolated at the initialization of a time instance instead of somewhere in between some operations which require a conversion to a different zone.

And then there is the case of floating time, i.e. with no direct correlation to a specific instant. Which can be time-zone independent (like an alarm at 8:00 in the morning, in whatever time zone applies) or based on entirely different scale (like 2:31 in a cat video on Youtube). While this could theoretically be expressed with a "special" nil timezone, I feel more like treating this as something completely different through type safety.

RX14 commented 6 years ago

This really should be split into multiple issues, general issues with little actionable specifics should be avoided. A lot of this has been implemented anyway.