d3x0r / JSON6

JSON for Humans (ES6)
Other
237 stars 15 forks source link

JSON6 Superset of JavaScript #46

Open brettz9 opened 4 years ago

brettz9 commented 4 years ago

If not allowing parsing which errs upon use of the JSON6 (or changing JSON6 to become a strict subset of JS), I would find it helpful to have documented the JSON6 features which are not allowable in JavaScript.

I changed some tests around so as to see whether the tests would unexpectedly pass or fail within JavaScript. Some details below on how I identified the missing support if of interest*.

I see the following divergences**:

  1. In JSON6, old-style octal support has not been entirely removed (from source or tests), so eval complains about this for strict mode.
  2. ES does not fail with .123e2-45, .123e3-45
  3. Not allowed in ES: Literal newline inside string, e.g.,{a:'abc\ndef'} (ok with backticks)
  4. Not allowed in ES: Back-tick quoted keys
  5. Not allowed in ES: Keys with special symbols (hyphens, backslashes, ampersand, plus, asterisk, pipe)
  6. Not allowed in ES: There aren't tests for it, but multiple minus signs
  7. Not present in ES: Console warnings with an unfinished // comment

My suggested lines of action (besides documenting any remaining differences):

  1. No. 1 appears to be a bug to be fixed (per the current README).
  2. No. 2 would be a good feature to add.
  3. No. 3, 4, 6, and 7 do not add much value, imo, to justify deviating from JS, esp. with backticks available
  4. No. 5 adds some value, but imho, it would be better to keep JS compatibility
  5. Restrict use of unescaped ${...} within backticks so that moving JSON6 to JS could not suddenly turn into variable substitution.

No. 3-5 could even be proposed to JS for them to be supported in a future ES version, though they might want to reserve the characters for potential use or deliberately err with those characters to prevent typos.


* I created an "drop-unused-stream-reviver" branch (not meant for merging) to find these (running npm run mocha-lite or nyc run nyc-lite). This branch:

  1. Switches the lite tests (i.e., tests not in the nested folders for benchmarking but which nevertheless have 100% coverage) slightly to use eval to confirm they work in JS (except in the case of a reviver argument (whose tests I've confirmed are only necessary as far as coverage in checking the reviver)
  2. Drops the stream and reviver tests and code so can check coverage related to eval. (I also have a no-stream-coverage branch which shows a few lines still uncovered by the non-stream tests and which I'm not 100% sure are stream related, but they seem to be)

**Note that there were also two tests which were expected to fail but would not fail in JS as they have a different meaning there and should understandably not be allowed:

  1. { a : 'no quote' [1] } (because of array index)
  2. .123-45 (because of being subtraction)
d3x0r commented 4 years ago

If not allowing parsing which errs upon use of the JSON6 (or changing JSON6 to become a strict subset of JS), I would find it helpful to have documented the JSON6 features which are not allowable in JavaScript.

I concur - the one deviation I know is signficant is strings allow any unicode character until the same unescape ending quote.

I changed some tests around so as to see whether the tests would unexpectedly pass or fail within JavaScript. Some details below on how I identified the missing support if of interest*.

I see the following divergences**:

  1. In JSON6, old-style octal support has not been entirely removed (from source or tests), so eval complains about this for strict mode.

  2. ES does not fail with .123e2-45, .123e3-45 Interesting I wouldn't think this should either, but that's a quick fix

  3. Not allowed in ES: Literal newline inside string, e.g.,{a:'abc\ndef'} (ok with backticks)

  4. Not allowed in ES: Back-tick quoted keys really? :) Why? it's just a string.

  5. Not allowed in ES: Keys with special symbols (hyphens, backslashes, ampersand, plus, asterisk, pipe) right; but this ends up with the parser in 'I'm in an object, there hasn't been a colon, the next thing is a field name. Until the colon or whitespace' a comma is an error, a close object would be an error, open/close array is an error... so it can almost collect all of those if it actually was in a field substate.

  6. Not allowed in ES: There aren't tests for it, but multiple minus signs I think + and - are maybe accepted? + is a no-op basically...

My suggested lines of action (besides documenting any remaining differences):

  1. No. 1 appears to be a bug to be fixed (per the current README). sorry where does octal support exist? in what way?

  2. No. 2 would be a good feature to add.

  3. No. 3 and 4 do not add much value, imo, to justify deviating from JS, esp. with backticks available then don't use it; again this is on the stringification side and or reading user edited files.

Why wouldn't a user just expect a quote to continue? Why does any language absolutely break with \n as an early terminator?

  1. No. 5 adds some value, but imho, it would be better to keep JS compatibility the more tests one adds for 'is this the right or wrong thing to do' the slower I get.

  2. Restrict use of unescaped ${...} within backticks so that moving JSON6 to JS could not suddenly turn into variable substitution. that's a code thing, the one thing this only supports is data; that's explicitly said that this transports no code.

do you expect to copy .c or .ts files into .js files and have them work?

I mean... this ireally more like (in node) I do console.log( myObject ) and copy THAT into a file to read with JSON6. chrome makes it hard to copy the string

No. 3-5 could even be proposed to JS for them to be supported in a future ES version, though they might want to reserve the characters for potential use or deliberately err with those characters to prevent typos.

  • I created an "drop-unused-stream-reviver" branch (not meant for merging) to find these (running npm run mocha-lite or nyc run nyc-lite). This branch:
  1. Switches the lite tests (i.e., tests not in the nested folders for benchmarking but which nevertheless have 100% coverage) slightly to use eval to confirm they work in JS (except in the case of a reviver argument (whose tests I've confirmed are only necessary as far as coverage in checking the reviver)
  2. Drops the stream and reviver tests and code so can check coverage related to eval. (I also have a no-stream-coverage branch which shows a few lines still uncovered by the non-stream tests and which I'm not 100% sure are stream related, but they seem to be)

**Note that there were also two tests which were expected to fail but would not fail in JS as they have a different meaning there and should understandably not be allowed:

  1. { a : 'no quote' [1] } (because of array index)
  2. .123-45 (because of being subtraction)
d3x0r commented 4 years ago

oh ya I can't parse that... that's not a valid floating point number...

where's the E?

Techically in stream mode it would be 2 numbers

brettz9 commented 4 years ago

If not allowing parsing which errs upon use of the JSON6 (or changing JSON6 to become a strict subset of JS), I would find it helpful to have documented the JSON6 features which are not allowable in JavaScript.

I concur - the one deviation I know is signficant is strings allow any unicode character until the same unescape ending quote.

You mean something like \uzá ?

I changed some tests around so as to see whether the tests would unexpectedly pass or fail within JavaScript. Some details below on how I identified the missing support if of interest*. I see the following divergences**:

  1. In JSON6, old-style octal support has not been entirely removed (from source or tests), so eval complains about this for strict mode.
  2. ES does not fail with .123e2-45, .123e3-45 Interesting I wouldn't think this should either, but that's a quick fix

Cool. Should I file an issue?

  1. Not allowed in ES: Back-tick quoted keys really? :) Why? it's just a string.

I guess because back-ticks are expected to come with evaluation (`This is ${interpolated}.`). One can get dynamic properties these days with this structure:

const key = 'hello';
const Greeting = {
  [key]: 'Hi!'
};
Greeting.hello === 'Hi!';

One could argue (or propose) about whether backticks here may better reuse an existing syntax, but:

  1. Some might argue having a different syntax helps visually distinguish that ES6 templates in code must be values, not keys.
  2. The above is how ES is now.
  1. Not allowed in ES: Keys with special symbols (hyphens, backslashes, ampersand, plus, asterisk, pipe) right; but this ends up with the parser in 'I'm in an object, there hasn't been a colon, the next thing is a field name. Until the colon or whitespace' a comma is an error, a close object would be an error, open/close array is an error... so it can almost collect all of those if it actually was in a field substate.

ES could, if that's what you mean, but it doesn't currently.

  1. Not allowed in ES: There aren't tests for it, but multiple minus signs I think + and - are maybe accepted? + is a no-op basically...

Single ones are allowed, just not multiple sequential ones (in some contexts where allowed, would be interpreted as increment/decrement, but neither is allowable for numbers).

My suggested lines of action (besides documenting any remaining differences):

  1. No. 1 appears to be a bug to be fixed (per the current README). sorry where does octal support exist? in what way?

Sorry, I was mistaken, but it is still an incompatibility in ES mode. For numbers and strings, the support for allowing leading 0's, albeit as decimals, will give an error, i.e., substituting JSON6.parse with eval in these tests--when in strict mode:

  1. No. 3 and 4 do not add much value, imo, to justify deviating from JS, esp. with backticks available then don't use it; again this is on the stringification side and or reading user edited files.

Why wouldn't a user just expect a quote to continue? Why does any language absolutely break with \n as an early terminator?

I agree it would be a reasonable feature in JS (though some might argue it either interferes with the chance to adopt the symbols in the future for other purposes or that it visually makes it appear like some other opertor (e.g., subtraction for the allowance for hyphen) when used in a JS context). But just reporting here the incompatibilities with current ES.

  1. No. 5 adds some value, but imho, it would be better to keep JS compatibility the more tests one adds for 'is this the right or wrong thing to do' the slower I get.

Understood. Perhaps JS vendors would like to see the spec changed too if it may help in this regard. Insistence on optimizing speed in all possible areas might help in focused projects, and serve as good experience for systems to come to adopt, but interoperability is a huge benefit. Developers will not end up applying it in the wrong way because the syntax is already familiar, there will not be interoperability problems with porting code (except of course if going from JS to JSON6) or with tools that might be made to work with JSON6--allowing even naive tool makers, whose tools inevitably become a part of popular tool chains, are less likely to have a problem in working with such a well-known format.

  1. Restrict use of unescaped ${...} within backticks so that moving JSON6 to JS could not suddenly turn into variable substitution. that's a code thing, the one thing this only supports is data; that's explicitly said that this transports no code.

do you expect to copy .c or .ts files into .js files and have them work?

I think JSON's popularity (and dethroning of XML) it in no small part due to the fact that it "just works" in JavaScript (esp. since the admitted line/paragraph separator issue has since been fixed as of ES 2019).

With JavaScript being apparently the most widely deployed programming language, as well as being accessible to beginners, and being the only real option for client-side web use, while available for server-side-use, in browser extensions, some desktop apps, etc., it makes sense for a universal data format to interoperate with it without extra hassle.

(FWIW, Just came across the proposer of JSON Superset arguing for this (as part of the now accepted proposal): https://github.com/tc39/proposal-json-superset#objections .)

However, with the opportunity given by linting to enforce dropping quotes, etc., it becomes more compelling to have something like JSON6 which will also just work--without one being forced to add double quotes everywhere.

If that is not your intent with this project, then sure, it's your call of course.

brettz9 commented 4 years ago

Another difference (inspired by https://github.com/ota-meshi/eslint-plugin-jsonc/pull/7 -- which is included in a new release of the jsonc plugin which already supports some JSON6 parsing, btw) is:

- 8 works in ES and JSON6, but + 8 does not work in JSON6.

d3x0r commented 3 years ago

re: + 8 that would complicate the parsing state, and would be +1 case to check; +_8 also doesn't work... -_8

+Infinity is also an error.. while -Infinity works. (pretty sure it's not valid); and is processed in a separate part of the state engine than numbers.

It's not hard to fix, added 1.0.9.js test; not sure how it works in a streaming mode... probably need a whitespace after the number and before a + to get the next.

d3x0r commented 3 years ago

Although I don't think + should work, I added it... which finished breaking base64 encoding of JSOX; the upper characters were '+/' the '/' character is a control character for comments, so it requires quotes, and now + outside of processing a number also needs a quote.
Technically '+' is always an operator and is never part of the data, unlike '-'. ---+---+++ shouldn't be legal (but it is, but is interpreted as each - is negative=!negative and '+' is ignored like '_' in numbers...

Reviewing that this was based off a huge amount of research, some of the expressions like 123+545 are expressions and not data values... a serializer would never emit that.

I am reminded this, because in fixing (adding) import *.json6 support, a stop-gap that worked was to just add 'export default ' before whatever source .json6 file there was, because JSON6 is basically JS without any operator or functions ( )... Unlike JSOX which starts to deviate, but resembles the console.log/util.format output more than JS... and a 'transpiler' for jsox would be no fun at all; basically need a decompiler for an arbitrary object to JS :) or (emit some sort of JS script instead of a real object),