JuliaGeo / CFTime.jl

Julia library for decoding time units conforming to the Climate and Forecasting (CF) netCDF conventions.
http://juliageo.org/CFTime.jl/
MIT License
6 stars 0 forks source link

DateTimeProlepticGregorian leap year in 300 #16

Closed visr closed 2 years ago

visr commented 2 years ago

Describe the bug

The year 300, which is not a leap year, has some odd behavior in the proleptic Gregorian calendar. Instead of going from February 28 to March 1, it goes from February 28 to February 29 to March 2, so March 1 is missing and replaced by a non existing leap day.

To Reproduce

using Dates, CFTime

isleapyear(300)  # false
isleapyear(304)  # true

daysinmonth(DateTimeProlepticGregorian, 300, 2)  # 28

Date(300, 2, 29)  # error: cannot create non existing day
g = DateTimeProlepticGregorian(300, 2, 29)  # can create in proleptic Gregorian
g = DateTimeProlepticGregorian(300, 3, 1)  # this creates the same day, February 29
g + Day(1)  # next day is March 2

DateTimeProlepticGregorian(2022, 1, 32)  # note that this creates February 1, not sure if desired

Expected behavior

I expect DateTimeProlepticGregorian(300, 2, 29) to error, or if desired create March 1.

Environment

CFTime v0.1.1

Alexander-Barth commented 2 years ago

I am not near a computer this week to check, but I am wondering what python's cftime does in this case. I thought that the proleptic Gregorian calendar would apply the regular rules of our current calendar befor 1582. It is not clear to me why it should be have irregularities of the year 300.

visr commented 2 years ago

Indeed it's not yet clear to me what's wrong with the year 300 specifically. @hboisgon encountered this issue, and this was the only date in a 10_000 year series that had this issue.

I tried python's cftime, there the issue is not present. It refuses to create DateTimeProlepticGregorian(300, 2, 29), and DateTimeProlepticGregorian(300, 3, 1) is indeed March 1.

The issue seems to come from the datenum_ac function, which returns the same value for Feb 29 and March 1 for year 300 (which gets scaled to 1900), and only happens for gregorian=true, in this code. But I cannot really pin it down.

Alexander-Barth commented 2 years ago

Here are some more years with this problem:

filter(y -> Dates.month(DateTimeProlepticGregorian(y, 3, 1)) !== 3,-2500:2500)

output:

7-element Vector{Int64}:
 -2101
 -1701
 -1301
  -901
  -501
  -101
   300
Alexander-Barth commented 2 years ago

I went back the to original publication "Meeus, Jean (1998) Astronomical Algorithms (2nd Edition). Willmann-Bell, Virginia. p. 63" and it claims that the argument (Z - 1867_216.25)/36524.25 is always positive:

https://github.com/JuliaGeo/CFTime.jl/blob/13519aa1dc34bd1a3cad115ce84887b8394e67c1/src/CFTime.jl#L165

However, for DateTimeProlepticGregorian(300, 3, 1) it becomes exactly -1.

Alexander-Barth commented 2 years ago

In the past python's CFTime used Meeus algorithm. But they use now a much simpler and clearer alorithm.

https://github.com/Unidata/cftime/blob/dc75368cd02bbcd1352dbecfef10404a58683f94/src/cftime/_cftime.pyx#L1678

As a test, it ported to julia. Unfortunately, it is also much slower (but it solves the issue mentioned here):

julia> @btime CFTime.datetuple_prolepticgregorian(60000) # based on python's CFTime
  6.017 μs (0 allocations: 0 bytes)
(2023, 2, 25)

julia> @btime CFTime.datetuple_prolepticgregorian_old(60000) # Meeus algorithm
  0.028 ns (0 allocations: 0 bytes)
(2023, 2, 25)

We might use different algorithms for different date ranges.

visr commented 2 years ago

Thanks for diving into this! I just tried the same benchmark, on the Julia 1.8 beta release, and the current CFTime master, with 1 small change.

@btime CFTime._Reference.datetuple_prolepticgregorian($(60000))
# 4.557 μs (0 allocations: 0 bytes)
# (2023, 2, 25)
@btime CFTime._Meeus.datetuple_prolepticgregorian($(60000))
# 12.312 ns (0 allocations: 0 bytes)
# (2023, 2, 25)

Still the Meeus algorithm is much faster, but instead of 215_000x faster, it is 370 x faster for me. The Meeus algorithm is slower in my benchmark due to interpolating 60000 to avoid compiling the constant in. And profiling the Reference algorithm I saw that passing this closure around caused dynamic dispatch, leading to allocations. So I removed that argument and called the right method directly in month_lengths. Though this issue probably didn't exist during your benchmarks since you have no allocations, and your timing was similar.

Do you know applications where 5 μs per date would be an issue? If not, to keep things simple I'd probably just stick with the Reference algorithm.

Alexander-Barth commented 2 years ago

Thanks a lot for fixing the benchmark runtime issue!

Yes, having two algorithms is not ideal. But the Meeus algorithm should be fixed now with this change:

https://github.com/JuliaGeo/CFTime.jl/blob/d8fb0e9bdbdae93446412d18b1150bdace1323a3/src/meeus_algorithm.jl#L127-L131

I tested for all dates between 1000 BC to 4000 AC, and I got now the sames dates for the reference algorithm (updated to avoid allocations) and the Meeus algorithm:

include("/path/to/test/test_time.jl");
Z = CFTime.datenum_prolepticgregorian(-1000,1,1):CFTime.datenum_prolepticgregorian(4000,1,1) # daily

MYMD = CFTime.datetuple_prolepticgregorian.(Z); # Meeus algorithm
# 0.152009 seconds (137.15 k allocations: 49.326 MiB, 8.65% gc time, 57.40% compilation time)

RYMD = Reference.datetuple_prolepticgregorian.(Z); # reference algorithm in test/
# 103.056894 seconds (188.46 k allocations: 51.930 MiB, 0.01% gc time, 0.11% compilation time)

@test MYMD == RYMD # no error

On very long time series (or very high-resolution time series), the speed of the reference algorithm could become an issue, I guess (at least I wished the reference algorithm were faster during testing 😄) . Also the runtime of the reference algorithm increase for dates future away from the start time.

In master, I updated the Meeus algorithm so that it relies entirely on integer arithmetic which makes its accuracy more predicable (overflows could still be a problem). The core-algorithms also now works with Int128 and BigInt as input.

The reference algorithm in the now used only for testing (so that there is no extract compile time for loading for typical users).

visr commented 2 years ago

Great that you managed to correct the issue inside the Meeus algorithm! That is indeed the best solution. I'll close the issue since it is now fixed. If you could tag a new release that would be much appreciated.