ThreeTen / threeten

This project was the home of code used to develop a modern date and time library for JDK8. Development has moved to OpenJDK and a separate backport project, threetenbp.
http://threeten.github.io/
191 stars 37 forks source link

Agree upon suitable default internationalised formatting style #51

Closed RichardWarburton closed 11 years ago

RichardWarburton commented 11 years ago

ISO 8601 doesn't apply to internationalised calendars. We need to agree upon whether to print out the Era name or not.

Is the current toString() on ChronoDate suitable?

RogerRiggs commented 11 years ago

See issue #82; The default style should contain the official IDs of Eras and Chronology names as they are extracted from Era and Chronology. Localization is not relevant in toString.

dchiba commented 11 years ago

How about [chronology name]-[era id]-[year of era]-[month]-[day of month] for toString? Where chronology name is the CLDR calendar name, era id is the CLDR era id and the rest are numbers.

e.g. ISO September 17, 2012: "gregorian-1-2012-9-17" or simply in pure ISO 8601 "2012-09-17" as an abbreviated form. Hijrah 1st of Thul-Qedar 1433: "islamic-0-1433-11-1" Japanese September 17, Heisei 24: "japanese-235-24-9-17"

Related to #46 for serialization format.

For UI localization, the formatting style is as defined in the locale support data.

dchiba commented 11 years ago

We'll be using the notation noted above, with an exception of using a string based ID instead of the numeric era ID.

For instance, the current Japanese era Heisei will get 'H' for the era ID.

masa310 commented 11 years ago

How do you represent leap months of the traditional Chinese calendar with the format?

dchiba commented 11 years ago

A Chinese leap month could have a suffix on the month number. For example, a leap month that follows the regular 9th month may be "9L". The transition of the month field could go like "9", "9L", then "10".

dchiba commented 11 years ago

The CLDR path to get the shortest era name in root.xml:

/ldml/dates/calendars/calendar[@type='<calendar name>']/eras/eraNarrow
RogerRiggs commented 11 years ago

The format is a bit dense with all the '-' separators.
With {chronology} {era}yyyy-mm-dd and abbreviated Era names it looks like:

           Japanese: Japanese H0024-11-01
             Hijrah: Hijrah AH1433-12-16
                ISO: 2012-11-01
             Minguo: Minguo ROC0101-11-01
       ThaiBuddhist: ThaiBuddhist BE2555-11-01

The suggestion is to remove the leading zeros on the year to improve readability.

jodastephen commented 11 years ago

Removing leading zeroes from non-ISO formats looks fine to me.

dchiba commented 11 years ago

Please use the formal CLDR chronology name.

This helps in the possible interoperability scenarios.

The ROC era name "ROC" needs dots to conform to CLDR: "R.O.C" is the formal name. To get "ROC" we could request CLDR to define the narrow era name "ROC", since "R.O.C." is coming from the abbreviated name.

As we need the era IDs to be stable, we may want to specify this to be "ROC" now and guarantee the IDs to never change. If we said the era IDs are the narrow CLDR era names, then they could potentially change (although unlikely). How about setting every era IDs in stone, with "ROC" for the Minguo calendar? I could compile the table if that helps.

dchiba commented 11 years ago

CLDR narrow era names:

type="gregorian"
BCE
CE
type="buddhist"
BE
type="japanese"
M
T
S
H
type="islamicc"
AH
type="roc"
Before R.O.C.
R.O.C.
type="persian"
AP
type="hebrew"    
AM
type="ethiopic"
ERA0
type="coptic"
ERA0
ERA1
RogerRiggs commented 11 years ago

I assume the capitalization in CLDR is not significant so it is reasonable to use Islamic instead of islamic. Should it be "islamicc" since we identified the civic calendar as the correct one?

           Japanese:     Japanese H24-11-02,   Japanese -0500-01-01
             Hijrah:  Islamicc AH1433-12-17, Islamicc Before AH1157-01-02
           Buddhist:  Buddhist BE2555-11-02,    Buddhist BE43-01-01
                ISO:             2012-11-02,            -0500-01-01
             Minguo:    ROC R.O.C.101-11-02, ROC Before R.O.C.2412-01-01

For Coptic, ERA0/ERA1 seem particularly poor choices, even if it is in CLDR.

I would change the class name ThaiBuddhistChronology to BuddhistChronology.

dchiba commented 11 years ago

Yes, being a Unicode locale extension tag, the CLDR calendar names are case insensitive, while the canonical casing is all lowercase.

And yes we need "islamicc" not "islamic". I should have said "islamicc" for Hijrah.

I suppose we could ask the CLDR folks if they have considered defining standard IDs for the eras. ERA0/ERA1 are not impressive as a narrow display name either, so we could get into an undesirable situation if CLDR changed them to something else, as it could then be difficult to change to something else due to possible regression issues.

BuddhistChronology sounds better for the class name. The description can clearly say it is the Thai calendar.

jodastephen commented 11 years ago

I reverted a number of changes in the patch as I think this discussion got sidetracked into trying to make the toString format relate more to CLDR than it needs to.

Firstly, the purpose of having two IDs in Chronology is to recognize that the CLDR form (calendar system type) is more of an internal ID that was not intended for widespread use at an application level. The calendar system type is also allowed to be null. As such, the Chronology ID is the one that Chronology.toString must return.

This also means that the ChronoDate toString forms must use the Chronology Id, not the calendar system type.

The addition of the dots in R.O.C. is driven by formatting concerns rather than developer concerns. It is much more consistent to use the name of the enum. While I have left the abbreviation in for Japanese, I would also argue that the full Japanese era constant name should be used.

The toString() is, in general, a developer tool for understanding the state of the object. It cannot be guaranteed to look pretty. For example, if we add control of the leap year pattern into the Hijrah calendar system, then the leap year pattern will have to be specified in the toString of both the Chronology and the ChronoDate. This fulfils an intentional design in 310 that all toStrings() on immutable value types fully represent the state of the object as far as reasonably practicable. (Joda-Time didn't do this and it caused a number of issues/complaints).

dchiba commented 11 years ago

Please use CLDR names for interoperability. If no CLDR name was assigned (e.g. application defined calendar), then the non-CLDR ID can be used with some notation to indicate it is not a standard ID. It can be important to be able to exchange date information in an non-ISO calendar and that requires a standard ID scheme. For example, Hijrah dates should be exchanged as a Hijrah date, unless it is guaranteed that the same deviation or leap year pattern is applied by the sender and the receiver.

I think CLDR's naming needs to be refined to deal with some of the issues with the Hijrah calendar variants. It has "islamicc" and "islamic" today. This is not an issue of 310 but CLDR. Some issues may be resolved by configuring deviations.

RogerRiggs commented 11 years ago

CLDR is the common set of names and formats used consistently across runtimes. There is no reason for 310 to make up new names and ids when they are already defined by CLDR. This is not an area where 310 can add value.

jodastephen commented 11 years ago

I thought this debate was established by previous threads. That (a) we wouldn't be limited by CLDR as it has backwards compatibility concerns and (b) that we would have chronology IDs separate to CLDR IDs, because the CLDR IDs are (1) incomplete (for application defined calendars), (2) inaccurate (buddhist rather than thai) and (3) weird (ethioaa or islamicc).

This resulted in a really simple pattern - the chronology ID is the name of the Chronology class, minus the Chrono/Chronology suffix. This allows developers to easily link the ID to the relevant class. The CLDR calendar ID is a secondary ID that is supplied for those calendar systems defined by CLDR and for integration with locales.

(ie, the value add is a sensible name for the chronology, rather than the calendar system, that is directly connected to the class name for ease of developer understanding)

Again I remind everyone that the toString is for developer understanding, not output. We have formatters for that. After all, no GUI is ever going to want to see either "Minguo" or "roc" as a prefix (or probably the era) to the ISO-style date - those are clearly internal formats. The output format uses the formatter and would be fully localized such that it is exactly the correct layout and set of fields for the calendar.

As far as I can see, the only interoperability at issue here is between two Java programs, where whatever we define is absolutely OK and should in fact be as well defined as possible (CLDR is not well-defined for application calendar systems). The only other interoperabiilty is with Locale, which is already handled.

dchiba commented 11 years ago

I look at this as an ISO 8601 like format for non-ISO chronologies that may be used in any inter-component communication. So the other party can be any non-Java technology. The 310 chronology IDs can be thought of as internal IDs, whereas CLDR IDs can be standard, public IDs. This is akin to Java character encoding names vs. IANA charset names.

jodastephen commented 11 years ago

I suggest that is the wrong view to have of the toString method. Its primary purpose should be for information to the developer, with a secondary purpose of providing useful information. For immutable value types, the toString must (generally) include all the elements of the state of the object. If this isn't so, people complain.

The purpose of the 310 ID is to provide an ID for the chronology, separate to that of CLDR. Since CLDR is an external standard, not controlled by developers, the only alternative when a user wants to define their own calendar would be something like forcing them to prefix by "user." so as to avoid future CLDR IDs. This isn't appealing, and I believe it is much clearer to most developers (who have never heard of CLDR) to have a simple ID that matches the class name.

Thus, so long as we have a chronology ID separate to the reference to the optional external CLDR ID, we should use the mandatory one in all relevant places. Thus the Chrono toString and the Chrono*Date toString refer to the Chronology ID, linked to the class name.

Developers wanting an interoperable date format have many other problems to deal with, such as ensuring that the details of the calendar system implementations actually match on each side. The toString format is the least of their worries, and much better fixed by defining a dedicated formatter. Were this a huge concern, I would support adding a method to DateTimeFormatters to provide a fixed format with the CLDR ID and CLDR era name. However, since there is no agreed standard for such a format (in CLDR or elsewhere) it seems to me that it doesn't achieve very much.

dchiba commented 11 years ago

I wish CLDR had handsome IDs as 310 does so everything could be consistent. CLDR IDs do have issues (clarity, stability, availability(no era IDs)) and unfortunately they don't seem to work well for interoperability, either. In the current circumstance, I think it is reasonable to define the fixed formatter for interoperability purpose and use the 310 IDs for toString. This would be the best solution although it may not be ideal. Can we settle on the followings.

{310 chronology} {310 era}-y-mm-dd

Examples:

         ISO: 2012-11-01
    Japanese: Japanese H24-11-01
      Hijrah: Hijrah AH1433-12-16
      Minguo: Minguo ROC101-11-01
ThaiBuddhist: ThaiBuddhist BE2555-11-01

ISO needs no chronology, no era. Defaults to ISO/CE. For BCE, the expanded representation with a proleptic year number (can be zero and negative) is used. 310 chronology is the clearer ID defined in 310. 310 era is narrow CLDR era name. (There is a separate corner issue to resolve for Japanese.)

Besides, developers are encouraged to know CLDR is nestled in Java for most localization functionalities. They are all aware Java is based on Unicode for character handling. They should be similarly aware that CLDR plays a major role for localization.

jodastephen commented 11 years ago

Are you confirming that the Japanese format should use the short era code, such as "H" rather than "HEISEI"?

Japanese H24-11-01
Japanese HEISEI24-11-01

(The latter would be more consistent, but between these options its your choice)

dchiba commented 11 years ago

Let us consult Masayoshi and come up with the details. We can only identify a handful of eras and we might need to cover more.

dchiba commented 11 years ago

Masayoshi,

Would you comment if Japanese eras should be printed with H/S/T/M alone or should be longer, such as "Heisei"? I think H/S/T/M may be preferred because it is shorter, commonly seen and covers the most use cases (everybody alive is born in one of these eras.). JDK8 could support only Meiji and later; is this your plan? I wonder if Keio and prior could just have "EnnnYy" or something alike where nnn is the CLDR era type number uniquely assigned for each era. Keio is 231, for instance.

Japanese E231Y2-05-09   ...  May 9th in the second year of Keio
masa310 commented 11 years ago

I think that era names have to be in full spelling to avoid any future conflicts.

As I pointed out a few times before, any dates before Meiji 6 require the lunisolar calendar systems. We could use a mapping table only for Keio which has only 4 years, though. Also an old convention was to apply a new era to the last year of the previous era (立年改元), which requires another disambiguation mechanism. Strictly speaking, the last day of an era is the first day of the next era in Meiji-to-Taisho and Taisho-to-Showa (即日改元), which is not supported.

In any case, it's a LOT of work to support all the eras before Meiji correctly. I don't mind if someone can provide correct implementations of all the lunisolar calendars, though.

For your reference:

http://en.wikipedia.org/wiki/Kei%C5%8D#Events_of_the_Kei.C5.8D_era http://ja.wikipedia.org/wiki/%E6%85%B6%E5%BF%9C http://ja.wikipedia.org/wiki/%E6%94%B9%E5%85%83

jodastephen commented 11 years ago

Once SEIREKI is defined and released, we will be unable to add additional eras before MEIJI in future JDKs. That is because SEIREKI defines the era value -2 and MEIJI -1, so there are no available numbers inbetween.

Personally, I'm fine with the eras that are defined. they cover a reasonable range of dates with reasonable accuracy (so long as it is fully specified). Ultimately, this is Oracle's decision.

RogerRiggs commented 11 years ago

If SEIREKI is the catch all for previous but unspecified eras. Leaving a hole in the assigned space would be an option to defer specifying additional ERAs until a future release. Will followup with more detail on the support for Japanese Eras.

dchiba commented 11 years ago

There are 232 eras before Meiji. Yes, can we reserve era values for them, while leaving those eras unspecified and unsupported at least for JDK8.

Japanese MEIJI1-01-03
Japanese TAISHO12-03-22
Japanese SHOWA41-12-10
Japanese HEISEI24-11-01
jodastephen commented 11 years ago

If we're leaving a gap, I'd suggest SEIREKI is -999, to make it obvious that it is a made up number.

DateTimeValueRange will suggest that eras can run from -999 to 1, so lots of those numbers will be invalid and require separate checking.

Should the toString format have a space in it:

Japanese MEIJI1-01-03    or
Japanese MEIJI 1-01-03
RogerRiggs commented 11 years ago

I opened a separate issue and marked it as an RFE to address the additional request so that we can close on the formatting questions raised here. The assignment of -999 to SEIREKI needs to be done as part of completing this issue.

dchiba commented 11 years ago

Can we summarize as follows; We let Seireki catch all eras prior to Meiji:

Japanese SEIREKI 645-01-03  ... Notice Gregorian year is used for the year of "Seireki" era.
Japanese MEIJI 1-01-03
Japanese TAISHO 12-03-22
Japanese SHOWA 41-12-10
Japanese HEISEI 24-11-01

For the integer era value, can we give

-999  Seireki 
-998  through -2  ... Unassigned 
  -1  Meiji    
   0  Taisho  
   1  Showa 
   2  Heisei  
   3, 4, 5 ...  Reserved for the future eras
jodastephen commented 11 years ago

Approved to make SEREKI -999, add necessary validation against -998 to -2, and add space and full enum name in Japanese toString format.