Open jgpallero opened 3 weeks ago
Interesting. This looks like it's somewhat about how Leap Day is handled when adding a calendar-year increment.
>> fecha1 = datetime([2020 2 29 21 0 15]);
>> fecha2 = datetime([2025 1 1 0 0 0]);
>> incy = calyears(2);
>> vec1 = fecha1:incy:fecha2
vec1 =
29-Feb-2020 21:00:15 01-Mar-2022 21:00:15 01-Mar-2024 21:00:15 01-Mar-2026 21:00:15
That last vec1(4) of 2026-03-01, higher than fecha2's 2025-01-1, is just wrong. I bet there's an off-by-one bug in the end condition there.
As for the other differences: 2020 was a leap year, so Feb 29 2020 existed. But there's no Feb 29 in 2022. My code is normalizing it up to March 1, like what happens when you pass "denormalized" out-of-range values to a datetime() constructor call.
>> datetime(2020, 2, 29)
ans =
datetime
29-Feb-2020
But it looks like when adding a calyear (or calmonth), Matlab normalizes the nonexistent Feb 29 leap day down to Feb 28 instead, on non-leap years.
I wonder if there's documentation or reference for this behavior. The Matlab doco page for `calyears says "Calendar years account for leap days when used in calendar calculations", but it doesn't say how it accounts for them. (Same for the calmonths and calendarDuration pages, to my eyes.)
Matlab datetimes and datenums are ISO calendar dates, I believe. I wonder if the ISO date/time standards specify how calendar interval arithmetic works, and how to handle leap years in them.
The other aspect of this seems to be how calendar intervals work when used as the middle operand in a "lo:d:hi" two-colon expression. It is surprising to me that vec1(3)
in your Matlab example is Feb 29. I would have thought it would be Feb 28, even though 2024 is a leap year. That's because I thought that the "lo:d:hi" two-colon operator generally worked like repeatedly adding d
to a working value initialized with lo
, copying successive results into elements of the output, until the next value would be larger than hi
. Like this:
% y = lo:d:hi - how I thought it worked (approximately)
y = repmat(lo, [0 0]);
tmp = lo;
while tmp <= hi
y(end+1) = tmp;
tmp = tmp + d;
end
In this case, once tmp
is normalized to a Feb 28 (or March 1) for a non-leap year, I'd expect it to never turn back in to Feb 29, because adding a d
of 1 calyear to a Feb 28 will always produce a Feb 28.
But instead, it looks like it's doing y(s) = lo + (d * s);
("s" for "step" here). I looked at Matlab's colon doco, and it seems that's maybe the right way.
x = j:i:k creates a regularly-spaced vector x using i as the increment between elements. The vector elements are roughly equal to [j,j+i,j+2i,...,j+mi] where m = fix((k-j)/i).
That sounds more like y(i) = lo + (s * d)
than repeatedly doing tmp = tmp + d
. (It doesn't result in a "regularly-spaced" vector in the case of datetime and a calyear/calmonth i
, but close enough.)
(Octave's documentation for colon isn't as detailed and doesn't seem to provide any guidance here.)
Fixing the upper bound bug is straightforward. I'll go ahead and do that.
I want to do a bit more reading to better understand the definition/specification here around the leap day normalization and "stride". (If you have any relevant links or books, I'd be happy to hear them.) But it sounds like Tablicious here needs two changes for Matlab compatibility: have datetime + calyears normalize down instead of up for nonexistent Feb 29s, and change the colon operator to do y(s) = s * d
instead of repeatedly adding d
to a working value.
BTW...
I don't know if this bug is related with https://github.com/apjanke/octave-tablicious/issues/134
Seems unrelated. #134 is about string parsing, and the "colon arguments" in its error message is just about using a colon with numerics like i = 1:size(x)
; it's not related to how datetime and calendarDuration implement colon support.
About the differences with Matlab in the leap day topic, I think the better option is to match the behavior of Matlab. Some developers have said sometimes that Matlab-compatibility is a goal (see, for example, the comment of Philip Nienhuis (comment number 9) in https://savannah.gnu.org/bugs/index.php?60107)
About the differences with Matlab in the leap day topic, I think the better option is to match the behavior of Matlab.
Yeah, Matlab compatibility is a main goal for Tablicious too, and that's almost certainly the right thing to do here. I just wish I knew why Matlab was doing it this way, and the issues around it in the "calendrical computations" area, so I could do a more complete fix. I like to have a decent understanding of the code I'm writing and the behavior it's trying to do. That way, I may be able to catch other related issues in that area (like, maybe leap seconds or daylight saving time have similar effects) and get them right too, maybe in an "elegant" way in terms of the classes' internal data representation, instead of fixing individual problems in a patchwork manner as they're discovered.
In particular, maybe this "leap day / add 1 calyear" behavior isn't limited to how datetime.plus
works, but somehow involves the internal behavior of how the semantics of datetime constructors, datenum()
, and calendarDuration
interact, in some maybe-undocumented manner. Like, should this behavior be accomplished in a more general manner than sticking a special-case "if date 1 is Leap Day, and the resulting date is not a leap year, subtract 1 day to make it Feb 28" bit into datetime.plus
and datetime.minus
?
Either way, I plan to make this behavior change to try to match Matlab, even if I can't find more thorough documentation on it.
Okay, I think this is actually a bigger issue than just leap days. I think it applies to any calyears or calmonths calendarDuration addition that would end up with a "denormalized" date where the day-of-month value is larger than the number of days in that month. I think Matlab's caldur behavior "caps" or rounds it down to the last day within that month, and then preserves the time-of-day part.
For example, January has 31 days. February has either 28 or 29 days. So, take 2004-01-31 and add 1 calendar month, or 1 cal year and 1 cal month. Tablicious currently does this:
>> datetime(2004, 1, 31, 3, 4, 5) + calmonths(1)
ans =
02-Mar-2004 03:04:05
>> datetime(2004, 1, 31, 3, 4, 5) + (calmonths(1) + calyears(1))
ans =
03-Mar-2005 03:04:05
>>
But I think, to match Matlab, it should produce Feb 29 2004 and Feb 28 2005 instead.
This is different from how the datenum function and datetime constructor work, where denormalized days "carry over" or advance the value into the next month. E.g., datetime(2005, 2, 31)
produces 2005-03-03.
So, I think the fix here is to change the calendarDuration addition logic to be:
dv
.And subtraction would do the same thing, except multiplying all the caldur components by -1 first.
I think this agees with the Matlab calendarDuration doco page which says, in the Tips section, "When you add a calendarDuration array that contains more than one unit to a datetime, MATLAB® always adds the larger units first.".
Note that this means that calendarDuration values do not have the normal commutative and associative math behavior that regular numbers do. (Even beyond floating-point rounding issues.) These following operations will not necessarily produce the same result.
t + calyears(1) + calmonths(2) + caldays(1)
t + caldays(3) + calyears(1) + calmonths(2)
t + (caldays(3) + calyears(1) + calmonths(2))
While I'm in here, I think Tablicious's calendarDuration structure can be refactored to simplify it. The Sign property can be removed, letting the sign of the Years/Months/Days/Time numeric values handle it. Plus, Years and Months can be combined into a single Months property, since here a calyear is defined to be 12 calendar months. Splitting them in to years, months, and quarters can be done as part of the display formatting instead. I think that'll make for simpler code in the implementation, including in the display formatting. Plus, smaller data structure using less memory, though that probably doesn't matter, because who's going to be making big calendarDuration arrays? Unless we do so as a temporary value in the colon()
implementation.
Maybe drop the IsNan property too, and just carry NaN values in the Months/Days/Time properties, normalizing them to all be NaN when any is set to NaN. They're private-access, so we can control that.
And I think we need to prohibit fractional (non-integer) values for calyears and calmonths components. Because they're variable-length things, doing fractional operations on them may not be well-defined, or if they are, may not really be sensible, or at least would be pretty difficult to implement. I think Matlab prohibits them. We should too, to match that behavior, and keep our code and behavior simpler.
Like, this here. That shouldn't be a thing.
>> calmonths(0.5)
ans =
0.5mo
>>
For example, January has 31 days. February has either 28 or 29 days. So, take 2004-01-31 and add 1 calendar month, or 1 cal year and 1 cal month. Tablicious currently does this:
>> datetime(2004, 1, 31, 3, 4, 5) + calmonths(1) ans = 02-Mar-2004 03:04:05 >> datetime(2004, 1, 31, 3, 4, 5) + (calmonths(1) + calyears(1)) ans = 03-Mar-2005 03:04:05 >>
But I think, to match Matlab, it should produce Feb 29 2004 and Feb 28 2005 instead.
In Matlab (R2024a) the results are
>> datetime(2004, 1, 31, 3, 4, 5) + calmonths(1)
ans =
datetime
29-Feb-2004 03:04:05
>> datetime(2004, 1, 31, 3, 4, 5) + (calmonths(1) + calyears(1))
ans =
datetime
28-Feb-2005 03:04:05
Like, this here. That shouldn't be a thing.
>> calmonths(0.5) ans = 0.5mo >>
calyears() and calmonths() in Matlab works only with integer values of years and months. There ir also a caldays() function
>> calyears(9)
ans =
calendarDuration
9y
>> calmonths(9)
ans =
calendarDuration
9mo
>> caldays(9)
ans =
calendarDuration
9d
>> calyears(9.1)
Error using calyears (line 38)
Input data must be a numeric array containing real, integer values. Use the CALMONTHS or CALDAYS functions to create calendar durations shorter than one year, or use the YEARS function to
create fractional fixed-length years as durations.
>> calmonths(9.1)
Error using calmonths (line 30)
Input data must be a numeric array containing real, integer values. Use the CALDAYS function to create calendar durations shorter than one month.
>> caldays(9.1)
Error using caldays (line 38)
Input data must be a numeric array containing real, integer values. Use the HOURS, MINUTES or SECONDS functions to create durations shorter than one day, or use the DAYS function to create
fractional fixed-length days as durations.
Okay, I think I've got a fix for the bogus upper-bound behavior over on branch caldur-math-fixes, in https://github.com/apjanke/octave-tablicious/commit/b26975f1c59b7c77e1dcce28cfee5f81eb353a1a. This is a partial fix for this bug report.
This fixes basically an off-by-one error in the upper-bound test and loop logic in the calendarDuration part of datetime.colon.
This also changes the datetime.colon logic for calendarDuration to multiply the inc (caldur) and add it to lo for each successive element, to get rid of "jigger" from varying-length periods. This will make it respect the month-end "capping" normalization, once that gets added to datetime.plus.
Also adds the missing mtimes method.
Here's the new behavior. Look better to you?
>> datetime([2020 2 29 21 0 15]):calyears(2):datetime([2025 1 1 0 0 0])
ans =
29-Feb-2020 21:00:15 01-Mar-2022 21:00:15 29-Feb-2024 21:00:15
>> dt = datetime([2020 2 29 21 0 15])
dt =
29-Feb-2020 21:00:15
>> (dt:calmonths(1):(dt+calmonths(14)))'
ans =
29-Feb-2020 21:00:15
29-Mar-2020 21:00:15
29-Apr-2020 21:00:15
29-May-2020 21:00:15
29-Jun-2020 21:00:15
29-Jul-2020 21:00:15
29-Aug-2020 21:00:15
29-Sep-2020 21:00:15
29-Oct-2020 21:00:15
29-Nov-2020 21:00:15
29-Dec-2020 21:00:15
29-Jan-2021 21:00:15
01-Mar-2021 21:00:15
29-Mar-2021 21:00:15
29-Apr-2021 21:00:15
>>
Notice how the 01-Mar normalization toward the end didn't "stick" for subsequent elements.
Should add some regression tests for this, but I'm short on free time.
Thank you for the fix!
The results of your examples in Matlab are:
>> datetime([2020 2 29 21 0 15]):calyears(2):datetime([2025 1 1 0 0 0])
ans =
1×3 datetime array
29-Feb-2020 21:00:15 28-Feb-2022 21:00:15 29-Feb-2024 21:00:15
>> dt = datetime([2020 2 29 21 0 15])
dt =
datetime
29-Feb-2020 21:00:15
>> (dt:calmonths(1):(dt+calmonths(14)))'
ans =
15×1 datetime array
29-Feb-2020 21:00:15
29-Mar-2020 21:00:15
29-Apr-2020 21:00:15
29-May-2020 21:00:15
29-Jun-2020 21:00:15
29-Jul-2020 21:00:15
29-Aug-2020 21:00:15
29-Sep-2020 21:00:15
29-Oct-2020 21:00:15
29-Nov-2020 21:00:15
29-Dec-2020 21:00:15
29-Jan-2021 21:00:15
28-Feb-2021 21:00:15
29-Mar-2021 21:00:15
29-Apr-2021 21:00:15
Apparently there is still present the problem of 28-Feb or 01-March when dealing with leap years and durations of a whole year
Apparently there is still present the problem of 28-Feb or 01-March when dealing with leap years and durations of a whole year
That's expected! I haven't gotten to that part yet. Need to make a modification to datetime.plus
to deal with it by doing "month-end capping". But this change to colon
means that once I've done that, colon will respect it and should produce the correct result.
Okay, I think I've got a fix for the last-day-of-month capping, plus a couple other things, in https://github.com/apjanke/octave-tablicious/commit/87f6afd5974845fb6c7a4d0fd4dcfdab42be9ba6.
What do you think of this behavior?
>> dt = datetime(2020, 2, 29)
dt =
29-Feb-2020
>> [dt:calmonths(1):datetime(2021,4,1)]'
ans =
29-Feb-2020
29-Mar-2020
29-Apr-2020
29-May-2020
29-Jun-2020
29-Jul-2020
29-Aug-2020
29-Sep-2020
29-Oct-2020
29-Nov-2020
29-Dec-2020
29-Jan-2021
28-Feb-2021
29-Mar-2021
>>
>> dt = datetime(2020, 1, 31)
dt =
31-Jan-2020
>> [dt:calmonths(1):datetime(2021,4,1)]'
ans =
31-Jan-2020
29-Feb-2020
31-Mar-2020
30-Apr-2020
31-May-2020
30-Jun-2020
31-Jul-2020
31-Aug-2020
30-Sep-2020
31-Oct-2020
30-Nov-2020
31-Dec-2020
31-Jan-2021
28-Feb-2021
31-Mar-2021
>>
Found a couple more problems while I was in there. Notably, the lo:inc:hi
behavior, when hi
is less than lo
and inc
is effectively negative, still just produces an empty array. Need to add a special case to handle that.
Looks great! Thank you very much!
See this example:
When I create a sequential vector of dates using the colon operator all works apparently well as the vector stops in an element lower than the last date.
But when I use years the result is not correct:
in Matlab the result is
the last element is always <= the greater limit. The bug is also present using calmonths()
I don't know if this bug is related with #134
Andrew's notes
Concise repro code:
TODO
hi:inc:lo
case for negativeinc
values. Currently just returns an empty array.lo:inc:hi
produces a descending sequence of values, and vice versa forhi:inc:lo
giving ascending values.