adobe / json-formula

Query language for JSON documents
http://opensource.adobe.com/json-formula/
Apache License 2.0
19 stars 8 forks source link

Types in sort/min/max #150

Closed Eswcvlad closed 4 months ago

Eswcvlad commented 5 months ago

Kind of related to the casting discussion from before, but seems different enough to be mentioned separately.

sort

Per spec, it looks like the input array gets casted to an array of number or strings, and then the same array type is returned from the function. But in the implementation the element types remain as-is and just sorted as numbers or strings (which seems correct tbh):

sort([false(), 1, "str"]) -> [1, false, "str"]

It is also not clear, what should happen, if array is not just numbers or strings, but mixed or contains bool/null.

Additionally it somehow works on arrays and objects:

sort([{a: 1}, [1,2,3]]) -> [[1,2,3], {"a": 1}]

sortBy

This is very much an unlikely scenario, but from the spec it is not clear what should happen, if the key function for some elements returns numbers, but for the others it returns string.

min/max

Similarly to sort, the type of argument is a bit misleading. as per description arrays can be mixed (i.e. types are cast inside the function). Though compared to sort it looks like implementation actually casts to number or string instead of returning the original type (not quite sure, what would be more correct here).

Same as with sort, it is not quite clear, what should happen, if first element is either bool or null.

Also unexpectedly works on arrays and objects, but returns their JSON string serialization:

min([{a: 1}, [1,2,3]]) -> "[1,2,3]" max([{a: 1}, [1,2,3]]) -> "{\"a\":1}"

JohnBrinkman commented 5 months ago

sort(): The problems are that:

Assuming we adopt the rule that we do not coerce when parameters allow multiple types... then arguably we would not coerce any array values. Must be either number[] or string[] If the array contents are not homogenous, then throw a TypeError.

Or we could change the input (and return) to be: {(number|string)[]} . But then we'd still throw a TypeError if we got any booleans, objects, nulls etc. Or we could say: {any[]} -- but that just seems too open-ended -- and doesn't offer a clear rule on whether we sort as number or string.

I prefer to go the strict route. It's not hard to create a homogenous array. e.g.: map(array, &toString(@)) It's also not hard to sort arrays with mixed types and return the original array e.g. sortBy(array, &toNumber(@)).

My one hesitation is that it might be nice to allow null values in the array... But I can overlook this because it's easy to mitigate and because everything is so much more clear when we're strict.

sortBy(): The current implementation does throw a TypeError if it gets inconsistent types. (although the message needs improving). e.g. sortBy([0,1,2,3], &if (mod(@,2), @, toString(@))) throws: TypeError: sortBy expected 2, received 0

min/max: whatever we decide for sort() we should adopt the same rules.

Eswcvlad commented 5 months ago

To be fair, it might be easier to just make sort(arr) be a shorthand for sortBy(arr, &@) and inherit all the rules from there. So it will also be Array -> Array, but with a strict rule on elements, as they are also keys.

My one hesitation is that it might be nice to allow null values in the array... But I can overlook this because it's easy to mitigate and because everything is so much more clear when we're strict.

Yeah, I also was thinking of that, but more in a context of sortBy. It might not be that rare to see sortBy(arr, &@.key) not working and being forced to write sortBy(arr, &notNull(@.key, 0)).

I was also thinking on maybe lifting key restrictions and just say, that it uses a < b on elements internally, which can fail for certain types. But I don't think, that current type coercion rules for comparison preserve transitivity, so it is a no-no...

For null you could always have SQL-like rules, where null is the smallest value. Or use the first non-null type as the main type for comparison, and then coerce null keys to that in comparison.

But, yes, it would be easier to just be strict now and lift restrictions later, if necessary.

sortBy(): The current implementation does throw a TypeError if it gets inconsistent types. (although the message needs improving). e.g. sortBy([0,1,2,3], &if (mod(@,2), @, toString(@))) throws: TypeError: sortBy expected 2, received 0

Yes, I was mostly thinking of just documenting it in the spec.