chakra-core / ChakraCore

ChakraCore is an open source Javascript engine with a C API.
MIT License
9.13k stars 1.2k forks source link

new Date().toLocaleDateString() invalid result #3091

Closed digitalinfinity closed 7 years ago

digitalinfinity commented 7 years ago

Mirrored from: https://github.com/nodejs/node-chakracore/issues/281

Microsoft Windows [Version 10.0.14393 { http_parser: '2.7.0', node: '7.0.0-pre8', chakra: '11.0.14393.1198', uv: '1.9.1', zlib: '1.2.8', ares: '1.10.1-DEV', modules: '48', openssl: '1.0.2h' } As per below toLocalDateString() is returning the 0x200e character instead of the space character var dd = new Date().toLocaleDateString() for(var dc of dd) console.log(dc.charCodeAt().toString(16)) 200e 36 200e 2f 200e 31 200e 2f 200e 32 30 31 37

digitalinfinity commented 7 years ago

Note: I can repro this in ch which is why I don't consider this a node-chakracore issue. cc @bobfrgit

dilijev commented 7 years ago

Just so we're on the same page, http://www.fileformat.info/info/unicode/char/200e/index.htm is a left-to-right mark (also known as a BiDi mark)

This is related to, if not a duplicate of #599 -- BiDi markers are allowed by spec within implementation-defined strings. /cc @bterlson

(Note this is WinGlob-specific and would be fixed by switching to ICU-based implementation #2919).

Also I'm not sure spaces would have ever been used in this output... Here's a modified repro:

>eshost -is test.js
## Source
let dd = new Date().toLocaleDateString('en-US');
print(dd);
print(dd.split('').map((x)=>x.charCodeAt().toString(16)).join(','));

#### d8, sm
6/1/2017
36,2f,31,2f,32,30,31,37

#### ch-1.2.3, ch-1.5.0, ch-1.3.2, ch-master, ch-1.4.4, ch-dev-rel, ch-dev-dbg
?6?/?1?/?2017
200e,36,200e,2f,200e,31,200e,2f,200e,32,30,31,37

#### node-ch
‎6‎/‎1‎/‎2017
200e,36,200e,2f,200e,31,200e,2f,200e,32,30,31,37

#### node
6/1/2017
36,2f,31,2f,32,30,31,37
bterlson commented 7 years ago

Seems like #599 (with the caveat that I agree there should be no spaces).

Perhaps @digitalinfinity can clarify what is invalid about our current semantics?

dilijev commented 7 years ago

I've added the Duplicate label but I'll leave this open for now to track discussion - it's still possible there's a unique issue being reported here.

digitalinfinity commented 7 years ago

Hah! I forgot about #599 - it looks like the same issue to me. The title of the issue was just copy-pasted from the issue that @BobFrGit opened on nodejs/node-chakracore so I'll defer to him as to why he considered it invalid. It sounds like there's no spec consensus around this (around whether a locale-specific date can have bidi markers inserted in it or not?). Personally, I think it's really dependent on what you want to do with the string. If it's going to be displayed or rendered, then the bidi markers make sense. If it's for the purpose of enumerating the characters (as @BobFrGit is doing in his snippet), then it's a little annoying to have to filter out the non-renderable characters, in which case I could see how this behavior can be considered invalid.

dilijev commented 7 years ago

IIRC current spec language intends that locale-formatted strings should be opaque (no dependency should be taken on the actual contents) and used specifically for rendering, and inspection of the underlying data should be done in another way (there may or may not currently exist a way to do this, but there is also at least one relevant spec proposal: https://github.com/maggiepint/proposal-temporal).

BobFrankston commented 7 years ago

You’re right. I was thinking of spaces because I originally was using toLocaleString() which was spaces. Thinks for the split(‘’) trick

As to the bidi – that raises a whole set of issues about the design point of Unicode but that’s another topic.

Bob Frankston

http://Frankston.com http://Frankston.com

@BobFrankston

bterlson commented 7 years ago

It sounds like there's no spec consensus around this (around whether a locale-specific date can have bidi markers inserted in it or not?).

There is consensus presently: it is allowed. However, going forward, it is not clear that this is the best approach for new APIs (see also: meeting notes from last TC39).

If it's going to be displayed or rendered, then the bidi markers make sense. If it's for the purpose of enumerating the characters (as @BobFrGit is doing in his snippet), then it's a little annoying to have to filter out the non-renderable characters, in which case I could see how this behavior can be considered invalid.

The rationale behind the current design is that the former use case is what this API is designed for (in other words, the strings coming out of Intl APIs should be considered opaque) and the latter use case is explicitly not what Intl APIs are for. However it is clear that there is a desire to use these APIs in ways that are traditionally considered "unsafe", and thus this area is worth reconsidering (and in fact new intl APIs are somewhat on hold until this question is figured out).

digitalinfinity commented 7 years ago

Makes sense- thanks for the clarification @bterlson and @dilijev!

BobFrankston commented 7 years ago

For now I'm doing defensive programming. The API documentation should mention such caveats because so much of JavaScript is leaning by example.

There is another problem (which I tried to report but maybe in the wrong thread). If I provide options such as timeZone Chakra throws an error instead of ignoring options it doesn't handle.

(BTW, in case anyone on this list is working on the date problem I'll throw in a plug for getting rid of the accursed leap second (leap second)

BobFrankston commented 7 years ago

I see the question about what I'm trying to do -- it's simple. I want to generate a date stamp. What would work best for me is the equivalent of C#'s picture formatting for dates. In fact such formatting would be wonderful for JavaScript in general and, even better, if it could be part of the tagged string mechanism.

So it's not so much that I'm parsing the string as trying to rearrange it.

dilijev commented 7 years ago

There is another problem (which I tried to report but maybe in the wrong thread). If I provide options such as timeZone Chakra throws an error instead of ignoring options it doesn't handle.

@BobFrGit could you link to the issue you mentioned (or open a new issue on ChakraCore with a repro) about that error you're seeing? That's definitely a bug, but this is the first time I'm hearing about it.

dilijev commented 7 years ago

(BTW, in case anyone on this list is working on the date problem I'll throw in a plug for getting rid of the accursed leap second (leap second)

@BobFrGit Let's have one topic per issue. But please feel free to open another issue. (Though FWIW, I don't think there's anything we can do about it as we have to support things the way they are. :P)

dilijev commented 7 years ago

For now I'm doing defensive programming. The API documentation should mention such caveats because so much of JavaScript is leaning by example.

@BobFrGit Which API documentation are you referring to? MDN? Something else? (I'd recommend filing feedback with those sources of documentation.)

The spec is essentially the final word on things like this, but we all acknowledge that it's not really in a JS-developer-consumable format (it's designed to be a standard rather than dev documentation, and implementers are the main consumers, and even for us it's sometimes difficult to read).

dilijev commented 7 years ago

So it's not so much that I'm parsing the string as trying to rearrange it.

@BobFrGit Why rearrange it? What form would you like it to be in?

BobFrankston commented 7 years ago

The reference I used is https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Date/toLocaleDateString.

As to the format I want for this particular purpose yyyy-MM-dd HH:MM:SS.fff (to use the C# notation).

The bidi issue reminds of going into the rabbit hole of Unicode for languages with complex graphics and issues like ligatures. So I'm actually a fan of having the Bidi markings but at this point perhaps it should be in a new family of interfaces such as toLocalePresentation to make it explicit.

JavaScript has a set of related issues for Unicode along these lines with older APIs that assumed 16 bit characters. Presentation and representation get entangled. Should a red A equal a green A?

dilijev commented 7 years ago

@bterlson ^ +1 for custom date format strings :)

@Cellule mentioned http://momentjs.com/ to me earlier today -- maybe that will help unblock you here?

dilijev commented 7 years ago

So I'm actually a fan of having the Bidi markings but at this point perhaps it should be in a new family of interfaces such as toLocalePresentation to make it explicit.

@BobFrGit: That's a good idea -- though obviously everything in the spec needs a proposal. /cc @bterlson is this covered by any current proposal? Think there's enough value add in such an idea to warrant a new proposal?

BobFrankston commented 7 years ago

I looked at moment in the past and I can reconsider it. I think I passed initially because I was doing a project without library support. Now that I appreciate presentation I do see the value in a bidi option for presentation.

As to the more general problem of formatting -- I haven't been looking at proposals so not sure what is happening in this arena.

BobFrankston commented 7 years ago

Another issue with bidi -- if I have a piece of code with a time stamp on a line and then want to replace it with an equal number of spaces so subsequent lines are continuation without their own stamp.

var stamp = new Date().toLocaleString();
var stx = ''.padStart(stamp.length);

The problem is that stamp.length is 28 not 18 because the bidi characters are part of the string. This isn't a date problem as such but rather a problem with the JavaScript string library. Oops. Like the problem of dealing with UTF-32.

Before adding the bidi marking we need to have a complete library for handling presentation strings rather than treating all strings as just strings. This isn't just bidi but applies to all sorts of scripting.

This isn't the right forum for a full discussion but rather at the intersection of two implicit agendas.

dilijev commented 7 years ago

@BobFrGit We implement the standard and sometimes standard proposals that have advanced far enough. This kind of discussion should definitely be directed to tc39 either via the IRC channel #tc39 on chat.freenode.net (slurp logs) or TC39 proposals repo.

FWIW the length of a string is the number of code units (i.e. 16-bit units of Unicode code points). In this case, each character is a single code unit, so the discrepancy in the count is because of the BiDi markers, which are (correctly) counted as characters. If you want to strip out the unprintable characters and then pad your strings based on the result, you can do that.

BobFrankston commented 7 years ago

I understand the point about code units. Can look into TC39 it does take a lot to get into such a discussion. For now I'll just work around it. Is there a function for stripping out non-printables? There doesn't seem to be an regex class mechanism in JavaScript.

dilijev commented 7 years ago

tc39 is a really good place to start a casual discussion, so I'd recommend it if you have questions about the state of the spec or whether anyone is planning something.

dilijev commented 7 years ago

Seems we're in agreement about this being a duplicate of #599 so I'm closing this and deferring to the other issue.