adobe / json-formula

Query language for JSON documents
http://opensource.adobe.com/json-formula/
Apache License 2.0
19 stars 8 forks source link

String to number conversion corner cases #180

Open Eswcvlad opened 1 month ago

Eswcvlad commented 1 month ago

When implementing the spec, I assumed, that for a string to coerce to a number it should, basically, contain a number literal per json-formula rules. But it looks like in the implementation the JavaScript rules are used, which makes an expression like this work:

0 + "0x10" + "0o10" + "0b10" + "010" -> 36

Also there is this expression, which returns an invalid JSON value:

0 + "Infinity" -> Infinity

I stumbled upon this with "_localDate", but there it was just the leading zero.

It seems like something, that would be better to describe more thoroughly in the spec, as this could cause differences in implementations. Especially with leading zeros, which could be treated as octal markers.

JohnBrinkman commented 1 month ago

The JS implementation for toNumber() takes different paths depending on whether the radix parameter is something other than 10. Currently: toNumber("0x11") returns 17, but toNumber("0x11", 16) fails (we check for valid hex digits). We could fix that, but still, it's hard to treat these number literals consistently. e.g. in JS, parseInt("0b11") and parseInt("0b11", 2) each return 0, but "0b11" * 1 => 3

I think we should remove support for the JavaScript literals: 0x, 0b, 0o. i.e. an expression like "0b11" * 1 should fail with a TypeError. (also saves us from having to document these literals)

Eswcvlad commented 1 month ago

Well there could be other discrepancies, not only with literals prefixes:

Strings are converted by parsing them as if they contain a number literal. Parsing failure results in NaN. There are some minor differences compared to an actual number literal:

  • Leading and trailing whitespace/line terminators are ignored.
  • A leading 0 digit does not cause the number to become an octal literal (or get rejected in strict mode).
  • + and - are allowed at the start of the string to indicate its sign. (In actual code, they "look like" part of the literal, but are actually separate unary operators.) However, the sign can only appear once, and must not be followed by whitespace.
  • Infinity and -Infinity are recognized as literals. In actual code, they are global variables.
  • Empty or whitespace-only strings are converted to 0.
  • Numeric separators are not allowed.
JohnBrinkman commented 1 month ago

I will change the implementation to add regex expressions to check for valid numeric literals

JohnBrinkman commented 1 month ago

Please review https://github.com/adobe/json-formula/pull/181

Eswcvlad commented 1 month ago

Looks good.

Maybe change "Number literals may include a leading zero" to "Number literals may include leading zeros"? To be more explicit.

I'll test it a bit locally tomorrow.

Eswcvlad commented 1 month ago

Seems to be that everything is fine.