golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
122.95k stars 17.53k forks source link

proposal: time: optimize algorithms by restricting valid year range #63844

Closed normandesjr closed 3 months ago

normandesjr commented 10 months ago

TL;DR:

The proposal is to improve the performance of Go's time package using the recent algorithms developed by Neri and Schneider [1]. Their algorithms are faster than alternatives and they have been adopted by libstdc++ (the GNU implementation of the C++ Standard Library) [2], the Linux Kernel [3], .NET [4] and Firefox [5]. My benchmarks confirm performance improvements in Go as well. (See below.)

To achieve better performance we should change the supported range of dates: instead of supporting years taking any int value, they should be restricted to a much smaller range which still spans tens of thousands of years.

Check the benchmark results that I obtained after implementing Neri-Schneider algorithms in Go source code.

only-date

Implementation:

Cassio Neri and I have implemented in Go source code the 32-bits versions of Neri-Schneider algorithms. They support dates from -1,468,000/Mar/01 to 1,471,745/Feb/28. Unfortunately, the current documentation of

`func Date(year int, month Month, day, hour, min, sec, nsec int, loc *Location) Time`

doesn't explicitly state the range of supported dates but, since the argument year has type int, one might expect that any year from math.MinInt to math.MaxInt is allowed. This is further confirmed by the test TestZoneBounds which uses math.MinInt32 as a minimum year. Only this test fails when I replace the current implementations with Neri-Schneider algorithms.

The main goal of this issue is to argue that the range supported by Neri-Schneider algorithms is more than enough for all practical applications (*). We also believe that the specification of the time package should explicitly state the range of supported dates. Moreover, this range should be smaller than the one supported by Neri-Schneider algorithms just in case faster but more restrictive algorithms are found in the future. Consider, for instance, the range specified by other popular programming languages:

C++ [6]: from 1970/Jan/01 - 12,687,428 days to 1970/Jan/01 + 11,248,737 days, that is, from -32,767/Jan/01 to 32,767/Dec/31.

C# [7]: from 0001/January/01 to 9999/Dec/31.

Ecma Script (a.k.a. JavaScript) [8]: from 1970/Jan/01 - 100,000,000 days to 1970/Jan/01 + 100,000,000 days, that is, from -271,821/Apr/20 to 275,760/Sep/13

We believe that any of those ranges are more than enough for the large majority of users and, concretely, suggest to set Go to use the same range as C++. (As mentioned above, the 32-bits implementations support a much larger range.) If you agree that it is fine, then I could submit the PR showing how it was implemented.

Follow the time's package benchmark. As you can see other functions got the benefits as well.

package-time

(*) It's worth mentioning that the Gregorian calendar is still not perfectly aligned to Earth's motions and a small error is accumulating. In a few thousands of years the error should be quite noticeable. Furthermore, Earth's rotation is slowing down which amplifies the problem [9]. The point here is that it's virtually impossible for the Gregorian calendar to remain in use in the next few thousands of years, let alone in billions of years.

References:

[1] Neri, C, Schneider, L. Euclidean affine functions and their application to calendar algorithms. Softw Pract Exper. 2023; 53(4): 937–970. doi:10.1002/spe.3172 https://onlinelibrary.wiley.com/doi/full/10.1002/spe.3172

See also Cassio Neri's presentation on calendar algorithms at C++Now 2023. https://www.youtube.com/watch?v=0s9F4QWAl-E

[2] https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=libstdc%2B%2B-v3/include/std/chrono;h=10e868e5a036924177cf952a81f4bce10c497106;hb=HEAD#l1673

[3] https://github.com/torvalds/linux/blob/master/kernel/time/timeconv.c#L78 and https://github.com/torvalds/linux/blob/master/drivers/rtc/lib.c#L68

[4] https://github.com/dotnet/runtime/blob/main/src/libraries/System.Private.CoreLib/src/System/DateTime.cs#L1377

[5] https://searchfox.org/mozilla-central/source/js/src/jsdate.cpp#217

[6] https://eel.is/c++draft/time.cal.ymd#members-20

[7] https://learn.microsoft.com/en-us/dotnet/api/system.datetime.minvalue?view=net-7.0#remarks and https://learn.microsoft.com/en-us/dotnet/api/system.datetime.maxvalue?view=net-7.0#remarks

[8] https://262.ecma-international.org/14.0/#sec-time-values-and-time-range

[9] Duncan S. Marking Time: The Epic Quest to Invent the Perfect Calendar. 1st ed. John Wiley & Sons; 2000.

mauri870 commented 10 months ago

kindly CC @rsc (per https://dev.golang.org/owners)

AlexanderYastrebov commented 10 months ago

We also believe that the specification of the time package should explicitly state the range of supported dates.

It should be possible to split codepath into slow and fast versions depending on the year value without defining any new constraints on the year value, shouldn't it?

normandesjr commented 10 months ago

We also believe that the specification of the time package should explicitly state the range of supported dates.

It should be possible to split codepath into slow and fast versions depending on the year value without defining any new constraints on the yer value, shouldn't it?

Thanks @AlexanderYastrebov . It's possible, but I'm not sure if it's suitable. I've changed some functions, like absDate, daysSinceEpoch and Date, and we would add some complexity (and another branch) for a range of date that is unusal and that other languages already don't support. IMHO we should even add some asserts and panic if the year is outside the "new" specification. Make sense?

unixdj commented 9 months ago

@normandesjr

IMHO we should even add some asserts and panic if the year is outside the "new" specification. Make sense?

Panicking is a bit extreme, innit?

If I understand Cassio Neri's example C++ code correctly, he restricts values only for the duration of the calculation. Something like this (but also handling negative years):

    const (
        CycleLength     = SomeNumberIDK
        DaysPer400Years = 146097
        YearCycle       = 400 * CycleLength
        DayCycle        = DaysPer400Years * CycleLength
    )
    cycles, year := year / YearCycle, year % YearCycle
    days := CalculateDate(year, month, day)
    days += cycles * DayCycle

I would propose limiting timestamps to those that can be represented by int64 (i.e., handling overflow), or close to it (capping the year value to prevent overflow), and using the mechanism above to deal with calculations whose intermediate values may overflow.

Furthermore, Earth's rotation is slowing down which amplifies the problem

UTC is not the only kind of UT, so this shouldn't prevent me from printing the time when atoms started forming with nanosecond precision in my local timezone, so what if my city or my galaxy doesn't exist yet ;)

cassioneri commented 9 months ago

I wish to share my experience after having studied the Gregorian calendar and implemented myself support for it in the Linux Kernel, libstdc++, Firefox (for its Javascript engine) and also assisted others to implement them for .NET and Java (ongoing) official standard libraries and also a Rust user library.

The range of supported dates must be very clearly specified by the language as it is a contract which users can rely on. This is the first piece of information that I searched when I implemented these algorithms for any language's standard library.

What should be a reasonable range of dates selected by the language? There's no absolute answer for this question and @normandesjr's proposal cites a few choices taken by different languages. Pragmatism is required to select one since complications come from historical, geographical and astronomical aspects. To know more about the historical and geographical points, please, see my talk at C++ Now 2023 (from 27min7s) but let me elaborate on the astronomical point raised by @normandesjr and commented by @unixdj. This question is not about UTC and time zones.

Astronomers calculate that in the time of the dinosaurs, during a year Earth would spin about 400 times and not the 365 or so that it does nowadays. In other words, years were 400 days or so long. If there were people and computers around at that time they would most certainly not be using a calendar remotely similar to the Gregorian and thus for us to talk about 1st January year -65,000,000 is a bit of nonsense. Similarly, in a few million years the number of days in a year is expected to be far less than 365. I doubt the Gregorian calendar will still be used at this point. We don't need to go that far and even if Earth's movements were steady, the fact that the ratio between the duration of a day and the duration of a year is an irrational number means that the Gregorian calendar will be out of sync with Solstices and Equinoxes in a couple of thousands of years which again raises the question whether the Gregorian calendar will be used, unchanged, at this point.

In light of the above, should Golang commit itself to support a range of Gregorian dates spanning billions of years? IMHO, it should restrict the range to a more pragmatic one that suits well and efficiently the large majority of users now and in many generations to come but no more. A more restricted range allows more algorithms to be used and the best performing one (known or to be discovered) can be implemented. The point is that current users should not pay the price of using less performant algorithms that are destined to become obsolete far in the future just to support the current calendar far beyond its existence.

@normandesjr's proposal suggests picking the same range as C++ (from year -32767 to year 32767), which I believe is a very sensible choice, but anything not much larger than a few thousands of years is more than enough. Notice that changing the internal representation of time and its resolution is not proposed. Only conversion of internal representation from and to calendar dates (year, month, day) are touched.

Another advantage of a more restrictive range is that it allows a unit test to run through all possible dates and check the result. I've done that for libstdc++ and the Linux Kernel and these tests run under a second.

Finally, panicking if the input is out of range is useful. For Firefox, I added just a comment stating support for the range required by the JavaScript specification (Ecma-262). The thoughtful reviewer, Andre Bargul (towards whom I'm very grateful) asked me to assert that condition, which I did. Then a CI test crashed Firefox on this assertion revealing that there was a bug elsewhere. He then fixed the bug for the benefit of all users.

I hope this helps. Obviously I'd be glad to see my own algorithms being used in Golang but the question of precision of the specification is, in my opinion, more important than the choice of algorithm. @normandesjr's proposal tackles this point. It's a very good proposal.

unixdj commented 9 months ago

@cassioneri

The range of supported dates must be very clearly specified by the language

Definitely.

I'm inclined to make this range as broad as possible, because if you have a time.Time value, ideally you should be able to display it. Perhaps rather than restricting the range of supported days, a better idea would be to restrict the range of supported (or even "legal") Time values.

Also, see below regarding API compatibility.

This question is not about UTC and time zones.

Sure, I was not entirely serious about local time. And I was mistaken about UT, I meant TAI (International Atomic Time).

Loved your talk, BTW.

If there were people and computers around at that time they would most certainly not be using a calendar remotely similar to the Gregorian and thus for us to talk about 1st January year -65,000,000 is a bit of nonsense.

I would argue that perpetual Gregorian calendar is a fiction anyway, and astrophysicists talking about "13 billion years ago" is also a bit of nonsense, as you can't have a year without a planet.

Unless you define a year as 31,556,952 SI seconds, then it's fine I guess.

picking the same range as C++ (from year -32767 to year 32767), which I believe is a very sensible choice

The year range that tzcode (zoneinfo's code used in glibc and libc in many other Unix systems) supports is -2,147,481,748 to 2,147,485,547, which is the full range of int32_t whose 0 value means year 1900.

Finally, panicking if the input is out of range is useful.

Here I disagree. The Go standard library generally panics only on application programmer errors or if the internal state goes haywire. E.g., the time package panics on non-initialised Timer or Ticker values, non-positive intervals and nil Location values (application bugs), and on conditions indicating a bug in the standard library.

Time values may come from origins such as user input and database entries, and there may well be code out there that operates on values far away from now (e.g., for carbon dating or astrophysics), formatting the time as fmt.Sprintf(-t.Year() / 1000000, "million years ago"). Because if Go supports timestamps of almost 300 billion years from now, why roll your own?

Besides, introducing panics into date calculation code would break the Go 1 API compatibility promise. Which is another reason to support as wide range as possible.

cassioneri commented 9 months ago

@unixdj

[...] there may well be code out there that operates on values far away from now (e.g., for carbon dating or astrophysics), formatting the time as fmt.Sprintf(-t.Year() / 1000000, "million years ago").

Indeed, astrophysicists might be interested in billions of years. But this is a niche domain with clever people that are very capable of rolling their own algorithms.

Because if Go supports timestamps of almost 300 billion years from now, why roll your own?

For two reasons.

  1. Supporting billions of years rules out algorithms that require fewer instructions (save memory, especially in instruction cache), are faster and spend less energy. Hence, if Golang support billions of years, every user will bear the costs for, presumably, the benefit of a few specialists in niche domains.

  2. The clever people from these niche domains might not be interested in Time.Year() after all! Indeed, Time.Year() calculates the "year" that most of us consider in everyday live, i.e., year in the Gregorian calendar. (As you said, the "perpetual Gregorian calendar is a fiction anyway", so who uses it for calculations of billions of years?) Astronomers talk about sideral year, solar year, tropical year, vernal year, etc. They have their own time keeping system called Julian Day Number (JD) invented by Joseph Scaliger in 1583 and based on the average Julian calendar year of 365.25 days. Here is a quote from the International Astronomical Union (IAU) [1]:

Although there are several different kinds of year, the IAU regards a year as a Julian year of 365.25 days (31.5576 million seconds).

Hence, even if Time.Year() supports billion of years, for astronomers, it produces the wrong result and they won't use it. Furthermore, from the number of SI seconds since the JD epoch, they can get the number of years by a simple division by 31,557,600. In summary, for astronomers Time.Year() is wrong and slower than what they can implement.

[1] https://www.iau.org/public/themes/measuring/

unixdj commented 9 months ago

@cassioneri

Although there are several different kinds of year, the IAU regards a year as a Julian year of 365.25 days (31.5576 million seconds).

Interesting, I didn't know this.

But this is a niche domain with clever people that are very capable of rolling their own algorithms.

True. But I suspect that in many cases time.Time is probably involved at some point, which is what I meant by "why not" (it's there, might as well use it). I suspect that people working, say, with dates in Ancient Rome might wrap Time and override some methods for converting dates to use old Roman calendars before a certain date and call standard methods after, and astronomers might display the time as Julian years but convert it using Time.Unix for storage. Go enables this approach quite well, and I've done it with Time (for other reasons).

In any case, the Go core team will likely insist that the Go 1 compatibility promise is not broken.

Look, I understand your point. You never need to display the time of the formation of a galaxy as -11000000000-08-06 11:23:05.123456789 +0200 CEST. And I see that your algorithms are so fast that doing div-mod of the time by 10,000 or 100,000 years (perhaps conditionally after a comparison) will make calculations significantly slower.

Not the whole range works now, mind you. The time pkg doesn't convert dates before year -292277022399 of the perpetual Gregorian calendar, some 250 years after 1970-01-01 - 1<<63 seconds, correctly, and I expect weirdness on the other end too.

But I'm just not sure how much you can further restrict the range without breaking the compatibility promise that the Go team takes quite seriously.

And I hope you can see why panicking is not a good solution for Go, or at least for Go 1. I'm pretty sure the Go team would prefer to cap the value, return sentinel values or even document the limitation and return garbage.

Speaking of which, wouldn't a check like

if uint64(t) >= yearOneMillion {
    cycle, t = t / millionYears, t % millionYears
}
y, m, d := calculate(t)
y += cycle * 1000000

be almost as fast as this, for times within the range?

if t < lowLimit || t > highLimit {
    panic("what are you, an immortal?")
}
y, m, d := calculate(t)
AlexanderYastrebov commented 9 months ago

As I wrote in https://github.com/golang/go/issues/63844#issuecomment-1810175085 it should be possible to have two implementations conditioned on the year value. The existing implementation would need to stay anyways to verify correctness of the new code. Once proof of the pudding exists it might be easier to establish new API constraints.

AlexanderYastrebov commented 9 months ago

There is also a related https://github.com/golang/go/issues/56909 that reports year overflow above/below certain values

normandesjr commented 9 months ago

I've given an idea about panic, it was like a suggestion, the main idea here is to have the great benefit of a date algorithm faster (more than 25%) for a range that everybody needs.

gopherbot commented 9 months ago

Change https://go.dev/cl/548155 mentions this issue: time: improve performance of calendar algorithms and clarify its spec…

cassioneri commented 9 months ago

I believe there's a misunderstanding about the scope of the proposal. Let me clarify.

It does not propose limiting all functionalities of time to smaller ranges. It covers only functions based on the Gregorian calendar, namely, Date, Year, Month, Day, ISOWeek, YearDay and AddDate.

Functions that only need the number of ticks from epoch remain the same. Going back to the astronomers example, they can get IAU year by calling t.Unix() and dividing the result by 31,557,600. The proposal would not change anything on this perspective and billions of IAU years would still be supported in this way.

@AlexanderYastrebov: There's no need to add an if. The 64-bits version of our algorithms can support billions of Gregorian years. This is more effective than adding the if but still more costly than using the 32-bits version (which supports "only" millions of years.) I'd prefer avoiding extra costs altogether since supporting billions of Gregorian years, as I'm trying to say, is meaningless.

As we can see from comments on other issues, as far as time is concerned, Go is very vague on its promises. People have revoursed to reverse engineering or to look very deeply in the sources to find out supported ranges. IMHO, the current situation is quite messy and fragile. I think this proposal is a step in the right direction to fix many issues. It sets out a reasonable range of dates that is enough for the vast majority, if not all, real users and brings clarity, speed and energy savings for them.

unixdj commented 9 months ago

The 64-bits version of our algorithms can support billions of Gregorian years.

I took @normandesjr's code from https://github.com/normandesjr/go/blob/feat/date-algorithm/src/time/time.go and converted absDate and Date to use uint64. It works beautifully. Check this out (I used it outside the Go source tree, to avoid stashing my changes):

https://gist.github.com/unixdj/e31f75d5b49b5a2b3b4fbbae69368937

This is more effective than adding the if but still more costly than using the 32-bits version (which supports "only" millions of years.)

How much overhead does it add?

FWIW, a quick benchmark shows that my version of absDate runs faster. I looked at the assembly and eliminated one addition, but I don't speak amd64. First column: original, second: after changing the beginning of absDate as below, third: uint64, times for 100M runs.

        const (
                s = 3670
                K = 719468 + 146097*s
                L = 400 * s
        )
        N := uint32(abs/secondsPerDay - uint64(unixToInternal+internalToAbsolute)/secondsPerDay + K)
743.328392ms    708.183141ms    644.01476ms
718.081683ms    691.610367ms    653.935377ms
738.729335ms    701.506694ms    656.245927ms
736.034086ms    702.445939ms    641.974361ms
725.429312ms    688.144645ms    639.662721ms
722.263895ms    773.823083ms    657.371755ms
719.608441ms    696.451653ms    649.812959ms
715.551736ms    692.676515ms    647.024967ms

As we can see from comments on other issues, as far as time is concerned, Go is very vague on its promises.

True, unfortunately.

unixdj commented 9 months ago

@normandesjr Can you also change isLeap to @cassioneri's algorithm?

normandesjr commented 9 months ago

@normandesjr Can you also change isLeap to @cassioneri's algorithm?

Done.

normandesjr commented 9 months ago

I've just did an update with @cassioneri's patch, follow the new benchmark:

benchmar-2023-12-10

unixdj commented 9 months ago

Now takes more time now? That's weird.

bcmills commented 6 months ago

It does not propose limiting all functionalities of time to smaller ranges. It covers only functions based on the Gregorian calendar, namely, Date, Year, Month, Day, ISOWeek, YearDay and AddDate.

So what would the behavior of Date be for a Time value produced by calling Add with a positive duration on the maximum supported time.Time?

I think we should not only limit the range, but also limit the intermediate values so that they don't exceed that range: it would be surprising to reject a value in time.Date only to have exactly that same value returned from a call to the Date method on a time.Time variable.

rsc commented 4 months ago

This proposal has been added to the active column of the proposals project and will now be reviewed at the weekly proposal review meetings. — rsc for the proposal review group

gopherbot commented 4 months ago

Change https://go.dev/cl/586257 mentions this issue: time: optimize time <-> date conversions

gopherbot commented 4 months ago

Change https://go.dev/cl/586256 mentions this issue: time: add a few more benchmarks

rsc commented 4 months ago

@cassioneri, I really enjoyed reading your paper. Thanks very much for taking the time to do the research and write it up. Lots of great ideas in there.

@normandesjr, thanks for taking the time to write a PR and file this issue and start this discussion.

The year is limited to 16 bits in C++ because it is limited that way in C, and the limit in C comes from originating on 16-bit machines in circa 1969. I don't see a strong reason to adopt that limit in 2024, especially retroactively. I understand and agree with the point that times more than 32,767 years before or after now are meaningless, but since they worked in earlier versions of Go, you can bet that some code somewhere is using times with larger years as sentinels. For example something like:

var forever = time.Date(100000, January 1, 0, 0, 0, time.UTC)

Sure enough, here's a real example and another and another. Even though no one is planning for the year 100,000 or thinks the code will be running then or even thinks the Gregorian calendar will still be in use, times like that are still useful in a program. Programs create their own worlds.

Since those times can be constructed, it is important to be able to print them correctly and without panicking. It does not make sense to say they are valid but can't be printed, which is what it would mean for methods like Year or Date to panic or return incorrect results. Nor does it make sense to break a program that has defined an arbitrary future time like forever above.

The fact is that 32,767 is a fairly small number. People are very likely to reach for larger ones, as shown by the real examples. Those might as well work, and they do today. We won't take that away. Although it's not documented explicitly, package time aims to work for times with years up to the signed 64-bit second limit, give or take, which is around ±292,277,022,000. If your limit is 32,767, then yes it is important to specify exactly what it is, because 32,768 or 99,999 is a plausible number you might try, and also because if you really are near the year 32,767 now you have to worry about whether it the limit applies to the year in UTC or the year in the local time zone or both. In contrast, when the limit is 292 billion, it is less important to specify these boundaries with significant precision. For practical purposes, there is no limit: one less thing to worry about.

It may be worth noting that Go is different from languages like C and C++ in that it has only one time type that represents both a time instant and a civil time in a specific location. We did that quite intentionally, after seeing how much code gets confused by decisions like whether to use time_t or struct tm in C. While it may be defensible to impose the 32,767 constraint on struct tm, not being able to print a time.Time in Go is like not being able to print a time_t in C.

Another reason to permit a very wide range of years is for interoperability with existing systems that permit larger ranges. It is useful to be able to round-trip the times from these systems into time.Time and back losslessly. For example:

Another reason is bugs and debugging. I do from time to time make mistakes :wink:, and when I do, I appreciate being able to debug them. If I have a bug in my program that accidentally creates a time with the year 100,000, I need to be able to see that value in order to understand what I've done. Again, times that can be constructed must be printable.

The cost of a few 64-bit operations at the start of the algorithm is one we will gladly pay to keep compatibility with all the existing Go programs in the world and to keep programs debuggable. An argument was made about energy savings and the like, but that's not based on any realistic math. We are talking about a few nanoseconds per call, and there are basically no programs anywhere that spend any appreciable amount of their time in these routines. I checked our internal production profiling at Google, and time.Date and time.Time.absDate are currently taking less than 0.01% of Go cycles. One frustrating hours-long debugging session caused by not being able to print an erroneous time.Time will cancel out years of accumulated savings.

Aside from the year restriction, the code in the pull request is not written in a way that we can easily adapt as needs arise over time, and it makes different calculations in the package use different epochs. I took some time and implemented the ideas in the paper in more idiomatic Go and in a way that fits more harmoniously into the surrounding time package. I've posted that change at https://go.dev/cl/586257. I got very good performance results, copied below, even better than what @normandesjr posted.

It's clear from my change that we can reap essentially all the benefits of the improved, faster algorithms without making any kind of backwards-incompatible API changes about which years are valid. And if we don't narrow the range of valid years, then no proposal is necessary. Instead, we can close this issue as declined/retracted, and I will work on getting my change reviewed and submitted. I'm inclined to do just that, leaving the question of which years are valid to another day and a separate discussion.

I expect the change will be part of Go 1.23 or Go 1.24. Please feel free to leave comments on https://go.dev/cl/586257 if I have missed anything.

Again, it's also important to remember that despite the very large percentages in the benchmark report, the absolute savings is quite small, especially in typical programs: a few nanoseconds here and there. Realistically, it's going to take quite a while for the accumulated savings to exceed the 10s of trillions of nanoseconds I spent reading the paper and implementing this. On the other hand, I had fun, I learned some new things, and the new algorithms are more elegant than the ones I'd used before, so it's a welcome improvement.

Thank you again, @normandesjr and @cassioneri!


goos: linux
goarch: amd64
pkg: time
cpu: AMD Ryzen 9 7950X 16-Core Processor
                        │ timeold.txt  │             timenew.txt              │
                        │    sec/op    │   sec/op     vs base                 │
Format-32                  156.5n ± 1%   148.1n ± 1%   -5.37% (n=125)
FormatRFC3339-32           118.5n ± 1%   112.1n ± 1%   -5.40% (n=125)
FormatRFC3339Nano-32       119.2n ± 1%   113.0n ± 1%   -5.20% (n=125)
FormatNow-32               96.88n ± 2%   97.22n ± 1%        ~ (p=0.173 n=125)
MarshalJSON-32             79.77n ± 1%   75.82n ± 1%   -4.95% (n=125)
MarshalText-32             79.25n ± 1%   76.18n ± 1%   -3.87% (p=0.000 n=125)
Parse-32                   79.80n ± 1%   78.28n ± 1%   -1.90% (p=0.000 n=125)
ParseRFC3339UTC-32         29.10n ± 1%   28.90n ± 0%        ~ (p=0.094 n=125)
ParseRFC3339UTCBytes-32    30.72n ± 1%   30.88n ± 1%        ~ (p=0.894 n=125)
ParseRFC3339TZ-32          92.29n ± 0%   90.27n ± 1%   -2.19% (p=0.000 n=125)
ParseRFC3339TZBytes-32     133.4n ± 1%   132.0n ± 1%        ~ (p=0.004 n=125)
ParseDuration-32           41.11n ± 3%   44.08n ± 2%        ~ (p=0.088 n=125)
Hour-32                    2.834n ± 0%   2.829n ± 1%        ~ (p=0.891 n=125)
Second-32                  2.811n ± 1%   2.828n ± 1%        ~ (p=0.208 n=125)
Date-32                    9.228n ± 1%   5.788n ± 0%  -37.28% (n=125)
Year-32                    6.404n ± 1%   4.673n ± 1%  -27.03% (n=125)
YearDay-32                 6.399n ± 1%   5.802n ± 0%   -9.33% (n=125)
Month-32                   9.108n ± 1%   4.700n ± 1%  -48.40% (n=125)
Day-32                     9.106n ± 1%   4.686n ± 1%  -48.54% (n=125)
ISOWeek-32                10.060n ± 0%   7.998n ± 1%  -20.50% (n=125)
GoString-32                84.59n ± 1%   83.82n ± 1%        ~ (p=0.027 n=125)
DateFunc-32                6.993n ± 0%   6.144n ± 1%  -12.14% (n=125)
UnmarshalText-32           94.78n ± 2%   89.49n ± 1%   -5.58% (n=125)

goos: darwin
goarch: arm64
pkg: time
cpu: Apple M3 Pro
                        │ timeold-m3.txt │            timenew-m3.txt            │
                        │     sec/op     │   sec/op     vs base                 │
Format-12                    152.6n ± 0%   147.4n ± 0%   -3.41% (n=125)
FormatRFC3339-12            101.50n ± 0%   92.02n ± 0%   -9.34% (n=125)
FormatRFC3339Nano-12        101.30n ± 0%   92.68n ± 0%   -8.51% (n=125)
FormatNow-12                 93.50n ± 0%   94.65n ± 0%   +1.23% (p=0.000 n=125)
MarshalJSON-12               50.06n ± 0%   48.25n ± 0%   -3.62% (n=125)
MarshalText-12               49.70n ± 0%   47.51n ± 0%   -4.41% (n=125)
Parse-12                     97.91n ± 0%   95.90n ± 0%   -2.05% (n=125)
ParseRFC3339UTC-12           36.45n ± 0%   35.78n ± 1%   -1.84% (n=125)
ParseRFC3339UTCBytes-12      38.11n ± 0%   37.42n ± 0%   -1.81% (n=125)
ParseRFC3339TZ-12           100.80n ± 1%   97.58n ± 0%   -3.19% (n=125)
ParseRFC3339TZBytes-12       111.8n ± 1%   107.4n ± 0%   -3.94% (n=125)
ParseDuration-12             52.70n ± 0%   52.84n ± 0%        ~ (p=0.028 n=125)
Hour-12                      2.657n ± 0%   2.655n ± 0%        ~ (p=0.018 n=125)
Second-12                    2.656n ± 0%   2.654n ± 0%        ~ (p=0.084 n=125)
Date-12                      8.201n ± 0%   5.055n ± 0%  -38.36% (n=125)
Year-12                      5.694n ± 0%   4.086n ± 0%  -28.24% (n=125)
YearDay-12                   5.693n ± 0%   4.828n ± 0%  -15.19% (n=125)
Month-12                     8.206n ± 0%   4.231n ± 0%  -48.44% (n=125)
Day-12                       8.199n ± 0%   4.551n ± 0%  -44.49% (n=125)
ISOWeek-12                   9.032n ± 0%   7.298n ± 0%  -19.20% (n=125)
GoString-12                  62.78n ± 0%   60.61n ± 0%   -3.46% (n=125)
DateFunc-12                  7.318n ± 0%   6.431n ± 0%  -12.12% (n=125)
UnmarshalText-12             99.66n ± 0%   95.64n ± 0%   -4.03% (n=125)
cassioneri commented 4 months ago

@rsc Thanks for your praising words to my work.

Instead, we can close this issue as declined/retracted, and I will work on getting my change reviewed and submitted.

Firstly, I wish to address a social aspect. Although you did thank @normandesjr for his contribution, IMHO, this is not enough to give the credit he deserves. If the above is done, then @normandesjr won't have any credit for his contribution in the repo's history and I find this very unfair.

It's OK to think that @normandesjr's patch is not in idiomatic Go and you can do better. When I implemented these algorithms in the Linux Kernel, in libstdc++ and in Firefox, none of my initial patches where good enough. The maintainers took their time, guiding me to improve my patches until they were all accepted and I got authorship credit. Although it required more work, I felt happy about that and keen on keep contributing. If they have dropped my patches and made theirs instead, I would never consider contributing to these projects again. Please don't get me wrong: I don't mean to imply any wrongdoing. I believe this is an oversight that needs to be properly managed.

Perhaps, you could add a tag Co-Authored-By: normandesjr if that suits him.

My technical remarks in decreasing order of importance follow:

  1. "Undocumented range".

For practical purposes, there is no limit: one less thing to worry about.

I disagree. There has been too much emphasis on ±32,767 but this is a minor point, an arbitrary choice which I personally think is good enough but if you guys disagree, then pick a larger one. But please do pick one! Don't pretend there's no limit because there's a hidden one. As @normandesjr and I have repeatedly said and I can't stress that enough, whatever the range is, it must be explicitly documented.

Although it's not documented explicitly, package time aims to work for times with years up to the signed 64-bit second limit, give or take, which is around ±292,277,022,000.

Here you acknowledge that a limit does exist. Is there any unit test that checks these extreme values in different platforms? I doubt. Actually, if I'm not mistaken, with the current design, the limit is below the one you state. Indeed, consider the specification of functions Date and Year:

func (t Time) Date() (year int, month Month, day int)
func (t Time) Year() int

Clearly, year is of type int. The documentation states that int has an implementation-specific size which can be either 32 or 64 bits. Hence, if int has 32 bits, then year must be in [-2,147,483,648, 2,147,483,647]. In particular, they can't have all values in ±292,277,022,000.

Moreover, in the implementation there are lines of code similar to:

year := int(some_64_bits_value)

Hence, if some_64_bits_value has a value larger than the range of int, then the resulting year is wrong.

@normandesjr's PR doesn't have this bug mostly because it states the limits. If the user goes outside these limits then, its their fault (not Go's). If that happens, the user will immediately be made aware through the panic. I think this behavious is better than caring on using an year value that is completely wrong. However, as @normandesjr has said panicking is just a suggestion and a minor point of the proposal.

I assume that portability is important for Go designers and users. In this case, the arguments above show that year must be within the 32-bits range. This range is too large for my taste but it's definitely one that works. The 64-bits versions of many algorithms (not only mine) support this range. (But, without panicking, they are still subject to aforementioned silent conversion errors.)

  1. Constructs like var forever = time.Date(100000, ...).

This point is very good. Preferring not breaking code using such constructs is fair and desirable and I find useful that you provided real use cases. (Rather than speculating who could use large year values.)

Nevertheless, these use cases clearly show that programmers wished to use something like Date.Max. However, the language doesn't document what the limits are, let alone provide a programmatic way of referring to them (i.e., Date.Max and Date.Min). Users are forced to work around this problem and choose some large year numbers (99999, 999999 or 40000).

  1. Interoperability with DB systems.

This is a very good point against the ±32,767 limit but not against having a well documented limit. At the contrary, you have just increased the list of computer systems that clearly specify the supported date ranges. So far we've got: C++, C#, JavaScript, PostgreSQL, CockroachDB, Cassandra. Why shouldn't Go do the same?

  1. C++'s choice.

I don't think the reason for ±32,767 is to keep compatibility with C. In any case, this value is not important. Only the existence of a reasonable limit is.

A summary of my thoughts

Please give @normandesjr the credit he deserves for his implementation.

Having a robust specification is far more important than the choice of algorithm. One can discuss how large the range might be but the existence of limits and the need for them to be documented should not be questioned. Hence, closing this issue as declined/retracted seems very wrong to me. Instead, I suggest renaming this proposal to something that in my opinion better states its main point: Restrict and document the supported range of years and Dates.

Although not my preference, one can use the 64-bits version of my algorithm (or any other) which does support billions of years. (In any case, years must fit inside the range of int32.) Once again whatever the algorithm, years and Dates have limits which must be clearly documented. (Have I say that before?)

Although I'd prefer panicking, this is yet a minor point. Drop it if you wish.

rsc commented 4 months ago

Thanks for the reply @cassioneri.

I do hear you about documenting the range of valid years in the Go code. There are other issues filed for that. This issue doesn't have to be about that. You are also right that the Date method and Date function work in terms of an int-typed year, making the potential years expressible at those moments smaller than the years that can be stored in a time.Time. For better or worse, the API is the API and we can't easily change it. (Compatibility again.)

Credit in open source is always a difficult issue, and I sympathize with wanting to have contributed code directly to Go. Any time I take anyone's code and revise it, I absolutely credit them in the commit message. That's required both ethically and legally for copyright reasons. That's not what happened here, though. I saw the linked CL 548155, but I didn't understand the code at all, and I knew from the discussion here that it contained range limitations that we couldn't accept. So I read the paper and wrote new code from scratch using only the paper, to make sure I understood it and could maintain it and also that it didn't contain any hidden assumptions about valid time ranges.

There are many approaches to open source generally, and also many approaches that can be appropriate for a specific suggested change. If a change needs minor work to get submitted, then it absolutely makes sense to work with the contributor of that change. However, if, as the author and maintainer of Go's time package, I agree with the idea but would rather implement it in a completely different way, it can make more sense and save time for everyone involved to write the code anew myself. If you read both changes, perhaps you will see what I mean about it being completely different. In some cases it's common not to know exactly what the desired form is until I've done the exercise of writing the code, because only then do I understand what's going on. There were a variety of things that needed adjusting and cleanup in package time to properly accommodate the new algorithm, and those things are better done by the maintainer than someone new to the code base.

Note that there are a variety of ways to approach open source and some projects don't even accept outside contributions at all. The Go project is not that extreme. While we do try to help contributors revise their changes whenever possible, sometimes that's simply not possible, as happened here. I apologize that it means @normandesjr will not have an Author line in the Go Git repo from this PR. What matter most is getting to the right change that fits well into the existing Go source code, is readable, and will be maintainable for the next decade.

I did update the CL 586257 commit message to credit and thank both of you. Again I very much appreciate the time and effort that got us to this significant improvement, even if the code itself was not directly used. The most important output of engineering is understanding. The code is only a vehicle to achieve that understanding. The understanding you both contributed was critical to landing this improvement. Thank you again.

rsc commented 3 months ago

The optimization CL is pending and should be in Go 1.24. We just froze the tree for Go 1.23 and it didn't make it into that. There are also potential compatibility problems that would be good to give more time to shake out.

That said, with the optimization taken care of, the only thing left in this proposal is narrowing the year range. For the reasons above, we almost certainly shouldn't do that, so I will mark this proposal as likely decline.

Even though it says likely decline, the new algorithms will land. Thanks again.

rsc commented 3 months ago

Based on the discussion above, this proposal seems like a likely decline. — rsc for the proposal review group

rsc commented 3 months ago

No change in consensus, so declined. — rsc for the proposal review group