97jaz / gregor

Date and time library for Racket
45 stars 10 forks source link

Date formatting prints incorrect year in French locale #56

Open matteodelabre opened 1 year ago

matteodelabre commented 1 year ago

It looks like ~t produces incorrect years for some dates when the current locale is fr.

> (current-locale "")
> (~t (date 2023 05 31) "YYYY")
"2023"
> (current-locale "fr")
> (~t (date 2023 05 31) "YYYY")
"2024"

From what I can tell, this happens for May 29th, 30th and 31st of this year. It also happens for the years 2016, 2017, 2021, 2022, and 2023, but works properly in 2015, 2018, 2019, 2020, and 2024.

97jaz commented 1 year ago

Wow -- that's not good. I'll look into it. Thanks for letting me know.

97jaz commented 1 year ago

Just in case you weren't already aware: the YYYY pattern is for the year in a week-based ~ISO~ calendar, so it's not quite the same as the calendar year of the given date. If you're just interested in the latter, you should use yyyy instead.

That said, the week-based year should only differ from the normal calendar year near calendar year boundaries, so it certainly should not differ on 31 May. There's clearly a bug here.

97jaz commented 1 year ago

Yes, I see the bug. Thanks again! I'll have a fix out soon.

97jaz commented 1 year ago

This has turned out to be rather confusing -- not the bug in my code, which is obvious, but the determination of what the correct behavior should be. The CLDR spec has the following:

Week of Year

Values calculated for the Week of Year field range from 1 to 53 for the Gregorian calendar (they may have different ranges for other calendars). Week 1 for a year is the first week that contains at least the specified minimum number of days from that year. Weeks between week 1 of one year and week 1 of the following year are numbered sequentially from 2 to 52 or 53 (if needed). For example, January 1, 1998 was a Thursday. If the first day of the week is MONDAY and the minimum days in a week is 4 (these are the values reflecting ISO 8601 and many national standards), then week 1 of 1998 starts on December 29, 1997, and ends on January 4, 1998. However, if the first day of the week is SUNDAY, then week 1 of 1998 starts on January 4, 1998, and ends on January 10, 1998. The first three days of 1998 are then part of week 53 of 1997.

This description implies that the determination of the week on which a date falls is parameterized by two locale-specific pieces of data:

In the U.S., the minimum number of days in a week (according to CLDR's data) is 1, and the first day of the week is Sunday. In France the minimum number of days is 4, and the first day of the week is Monday.

Therefore, the en_US and fr_FR locales ought to be good test cases for the example mentioned in that text. In en_US, I'd expect that 1997-12-29 would be on the last week of 1997, whereas in fr_FR it would fall on the first week of 1998 (in the localized week-based calendar). However, in both locales, ICU4J reports that it falls on the first week of 1998. This suggests to me that ICU4J isn't using any locale information to render the results and is, instead, simply using ISO 8601 rules (which happen to be the same rules that the French locale uses).

So now I'm wondering if my attempt to use locale-specific data for the 'Y' pattern is just wasted effort and if I should, instead, just use the ISO 8601 week-based calendar rules. I'm not really sure if anyone ever uses a week-based calendar other than the ISO one, so it's not obvious to me that this pattern (and the related 'w' pattern) ought to be locale-sensitive.

97jaz commented 1 year ago

Oh -- also, I just noticed from your github info that you're in Quebec. The fr_CA locale would use the same parameters as en_US.

97jaz commented 1 year ago

Interestingly, TwitterCLDR appears to implement the algorithm described in the CLDR spec, but doesn't have the locale-specific parameter data, so it assumes Sunday for the start-of-week and a 1-day minimal week. So it doesn't use ISO 8601 rules.

97jaz commented 1 year ago

Ugh, no -- I misinterpreted the text. ICU4J is following the CLDR spec. Well, good -- then there's no confusion about how this should behave.