adobe / json-formula

Query language for JSON documents
http://opensource.adobe.com/json-formula/
Apache License 2.0
19 stars 8 forks source link

Handling math domain errors #130

Closed Eswcvlad closed 7 months ago

Eswcvlad commented 7 months ago

Usually, if you would call a math function with argument, which are not correct in a math sense for real numbers (ex, sqrt/log of a negative number, asin/acos outside of [-1,1]), you would get NaN or inf. But since those are not valid JSON numbers, we cannot do that for json-formula.

Behavior in such cases is not specified in the spec (except for avg and division by 0) and it looks like in the implementation it is either not handled explicitly (i.e. NaN or inf is returned and gets converted somewhere later) or it returns null.

In my opinion returning null is error-prone and might cause issues in practice. The, arguably, good thing about NaN is that as soon as it appears in a math expression, it "poisons" the whole expression and makes the whole expression return NaN, so you know, that there was an issue. But we don't have that with null. It is just silently coerced into 0 (which is, more often than not, a valid function output value) and the error is propagated further undetected.

I think it would be better, if we were consistent with division by zero and throw an evaluation error in such cases for functions as well and mention this in the spec.

Below some specifics on how it is handled now:

acos

Expression Result
acos(-1.01)|null
acos(1.01)|null

In the spec it just mentions parameter limit. Would usually have NaN here. A candidate for EvaluationError.

asin

Expression Result
asin(-1.01)|null
asin(1.01)|null

In the spec it just mentions parameter limit. Would usually have NaN here. A candidate for EvaluationError.

avg

Expression Result
avg(`null`)|0
avg(`[]`)|null

Per spec avg should return null for empty arrays. For some reason though it returns 0 for null, which is coerced to an empty array. From a math standpoint there is no mean of an empty set, so it also makes for a EvaluationError candidate.

log, log10

Expression Result
log(`null`)|null
log(`false`)|null
log(0)|null
log(-10)|null
log("uasdicyb")|null
log10(`null`)|null
log10(`false`)|null
log10(0)|null
log10(-10)|null
log10("uasdicyb")|null

In the spec it just mentions parameter limit. Would usually have NaN or +-inf here. A candidate for EvaluationError.

mod

Expression Result
mod(2, `null`)|NaN
mod(2, `false`)|NaN
mod(2, "uasdicyb")|NaN
mod(7, 0)|NaN
mod(0, 0)|NaN
mod(-7, 0)|NaN
mod(7.25, 0)|NaN
mod(-7.25, 0)|NaN

In this case it returns NaN on the test page, which is not a valid JSON number value. As well this is the closest analog to "division by zero", so EvaluationError makes great sense here.

power

Expression Result
power(0, -2)|Infinity
power(0, -2.5)|Infinity
power(-2, 2.5)|NaN
power(-2, -2.5)|NaN
power(-1.5, 2.5)|NaN
power(-1.5, -2.5)|NaN
power(`[2]`, 2)|[4]

As with mod, it returns NaN/Infinity, which are not JSON numbers.

The array case is a bit different, as in the spec only numbers are allowed, so a TypeError was expected. But in the implementation there is support for arrays, which in this case is equivalent to map(`[2]`, &power(@, 2)).

sqrt

Expression Result
sqrt(-16)|null

Would usually have NaN here. Another candidate for EvaluationError.

stdev, stdevp

Expression Result
stdev(`false`)|null
stdev(`true`)|null
stdev(13.37)|null
stdev("str")|null
stdev(`[]`)|null
stdev(`[1]`)|null
stdevp(`null`)|0
stdevp(`[]`)|null

From a math standpoint, stdev needs at least two values and stdevp needs at least one, as otherwise you will have division by zero in the calculation. For example, Excel for one element stdev also shows #DIV/0!. This also looks like a good candidate for EvaluationError.

For some reason, same as in avg, stdevp(`null`) is 0, which is different from an empty array case, while you would expect them to have the same result.

These are findings from running this WIP test suite.

JohnBrinkman commented 7 months ago

I think I can agree to all of these except avg(). Definitely need to fix the coercion so that avg(null()) gets processed as avg(`[]`) But... if we make it an error, then it would seem to open the door to also causing errors for Max(), Min()

Eswcvlad commented 7 months ago

But... if we make it an error, then it would seem to open the door to also causing errors for Max(), Min()

I would say those cases are a tiny bit different. The average of an empty set is undefined, while at the same time you could argue, that the biggest/smallest value in an empty set is "nothing" (i.e. null).

Though at the same time, if min/max were to return elements from arrays as-is, preserving types, then null can be a valid value, which could causes confusion. But I think as of now it returns number | string.

Btw, there is a missing type in the spec, that max/min only returns number, while they also can return string as it is now.

JohnBrinkman commented 7 months ago

We've discussed internally, and we're prefer to see avg(), min() and max() all behave consistently where an empty array will return an evaluation error.

Eswcvlad commented 7 months ago

Ran some tests on the latest commit and two points still remain from the issue:

  1. stdev and stdevp return null on too small arrays.
  2. power working on arrays.