97jaz / gregor

Date and time library for Racket
45 stars 10 forks source link

`parse-moment` doesn't work with CLDR pattern that includes `z`? #25

Open mbutterick opened 6 years ago

mbutterick commented 6 years ago

According to the docs, parse-moment “Parses str according to pattern, which uses the CLDR pattern syntax … If the input cannot be parsed as a moment, exn:gregor:parse is raised.”

z is part of the CLDR pattern syntax. Yet passing a pattern with z raises an error:

Works:

(require gregor)
(define pat "LLLL d, y")
;; I expect this to convert `now/moment` to a string, and then back to a moment
(parse-moment (~t (now/moment) pat) pat)

Does not work:

(require gregor)
(define pat+z "LLLL d, y z")
(parse-moment (~t (now/moment) pat+z) pat+z)

Moreover, the error raised is not of type exn:gregor:parse, but rather exn:misc:match.

mbutterick commented 6 years ago

PS Z works fine.

97jaz commented 6 years ago

Hi Matthew,

I'll give this a more thorough look in a few hours, but I have a strong suspicion that I didn't implement this on purpose, because (from a parsing perspective) z is hopelessly ambiguous. ("CST" might indicate "Central Standard Time," meaning "America/Chicago," or it might refer to "China Standard Time [?]," meaning "Asia/Taipei.")

That said, I could look to see how some other libraries implement this. (In the past, I've noticed that support for CLDR time zone parsing has been pretty lacking in most libraries. But this pattern, like pretty much all of CLDR, probably comes from ICU, so maybe I can figure out how they handle the z pattern.)

Even if it isn't supported, though, it should not raise that particular exception, so that's a definite bug.

mbutterick commented 6 years ago

I suppose I was hoping that ~t and parse-moment would turn out to be true inverse functions, in the sense that a moment string returned by ~t could always be passed to parse-moment and you get back the original moment.

I see what you mean about ambiguous input from the outside world. Though is it possible for ~t to return ambiguous results? For instance, gregor does not use "CST" to represent "Asia/Taipei":

(require gregor)
(define pat "LLLL d, y z")
(~t (now/moment #:tz "America/Chicago") pat) ; "October 23, 2017 CDT"
(~t (now/moment #:tz "Asia/Taipei") pat)     ; "October 24, 2017 GMT+8"
97jaz commented 6 years ago

@mbutterick You may be right. It could be that the CLDR rules eliminate all the possible ambiguity. I'll look into it this evening. Thanks!

97jaz commented 6 years ago

@mbutterick Of course, you still wouldn't get inverse functions out of that, since more than one TZ will map to GMT+8. In this case, Gregor could either return a moment that uses a UTC offset as the TZ, or else it could use the dreadful POSIX-compatible IANA TZs. (GMT+8 would correspond, if I'm not mistaken, to the UTC-8 IANA TZ -- for very weird historical reasons.)

mbutterick commented 6 years ago

The CLDR rules seem to acknowledge the many ambiguities and then punt entirely on specifics: when parsing, it can be “especially complicated … to figure out what the user meant.” You don’t say.

FWIW I don't mind if gregor only handles a subset of CLDR — using Z rather than z is no hardship — though it would be helpful if the docs mentioned the omissions.

97jaz commented 6 years ago

Yeah, so CLDR does mention that all times should round-trip under formatting and parsing, but not all time zones. So, we should always be able to parse a moment that will be moment=? with the original, but it will not necessarily be equal?.

97jaz commented 6 years ago

@mbutterick Looking at the source again, there are a number of zone patterns that gregor doesn't support for parsing. In fact, it only supports the ISO formats and the zone ID formats. I should absolutely document that and raise a better exception type.

I think that I'm going to hold off on implementing the other parsing formats in Gregor. I've been working on a replacement for gregor, and I'm just about to the point where I need to handle formatting and parsing and... localization. (Sadly, my progress has been embarrassingly slow, but it's picked up a bit recently.) I'll try to implement more of the patterns for parsing in the new library.

The current CLDR docs (I'm not sure how new this is) gives a sample algorithm for TZ parsing [http://unicode.org/reports/tr35/tr35-dates.html#Time_Zone_Parsing]. I kind of hate it, since it utterly ignores the pattern variable and, despite CLDR's advice, I think lenient parsing is a horrible default, but I might be able to do something with this algorithm.

evdubs commented 4 years ago

For whatever it's worth, I have a use case for wanting to parse with z. I have a Racket application that connects to a third party Java application and it returns a string for the current time and uses z. It is an application that I cannot change, therefore I need to handle it somehow. I am currently just chopping off the z timezone, but it would be much appreciated if this case was handled in the lib.

I am willing to help work on it, but I'll need guidance.

evdubs commented 4 years ago

I think that I'm going to hold off on implementing the other parsing formats in Gregor. I've been working on a replacement for gregor, and I'm just about to the point where I need to handle formatting and parsing and... localization.

How's this coming along? I periodically keep bumping into this issue.

evdubs commented 3 years ago

If you care, another user posted this link in #racket. https://pastebin.com/raw/NVffsgp3

They decided to work around the z timezone.