adobe / json-formula

Query language for JSON documents
http://opensource.adobe.com/json-formula/
Apache License 2.0
19 stars 8 forks source link

Some string function corner cases #152

Closed Eswcvlad closed 4 months ago

Eswcvlad commented 5 months ago

Most of this is not that important (expect for the search ones, those are pretty scary) and mostly related to the implementation.

contains

contains("", "") -> false, which looks unexpected. In most languages this would be true. It is also inconsistent, as at the same time

find

find returns pretty odd results, when start is negative:

Maybe using the JS rule of clamping it to 0 would make sense here?


Another case would be find("", "bab", X) returning 0, regardless of the value of X. It seems odd, that it would return a value, less than X.

JS rules for this seems a bit odd in a different way, as it doesn't return -1, if start is greater than string length.... Compared to behavior in Python, C++, ...

left

It is not specified in the spec what happens, if elements is negative. Maybe copy from right (i.e. return null) for consistency?

mid

It is not specified in the spec what happens, if startPos or length is negative. For length behavior now is pretty weird: mid("Fluid Flow", 0, -2) -> "Fluid Fl"


There is a quote missing in the example: mid("Fluid Flow, 20, 5)

replace

It is not specified in the spec what happens, if startPos or length is negative. Also not clear, what should happen, if startPos is too big.

search

Similar to contains, search("", "") -> [] and search("*", "") -> [] which is not expected with glob.

On the other side, search("", "012", 10) -> [10, ""], which is past the string end.


Seems like there are some outside of BMP issues, as search("A", "🙂A🙃") -> [2, "A"], instead of the expected [1, "A"].


Regular expression based implementation is leaking, as search("\\d", "A1B2C3") -> [1, "1"]. I.e. beside globs, regular expression meta classes also works.


Negative startPos is not mentioned in the spec and in the implementation it leads to some very weird results... search("A?", "B1-C3-A3", -8) -> [-2,"A3"] As with find I would propose to just clamp negative values to 0.

JohnBrinkman commented 5 months ago

In all these cases, I lean toward a more strict interpretation of parameters rather than inferring behaviors from unexpected values. It's also simpler to implement.

contains("", "")

should return true

find()

We should document that it wants start >= 0. Negative values should throw an error. I don't see a real-world scenario where allowing negatives is useful. More likely just confusing.

find("", "bab", X) returning 0, regardless of the value of X.

If X is greater than the length of the source, this should return null

left(), mid(), replace(), search()

I'd prefer we be strict that integer parameters are >= 0, rather than infer behaviors that are really unlikely to be leveraged by users

replace(): if startPos is too big...

Then we shouldn't do any replacement -- as opposed to the current behavior where we append the string

search("*", "")

I think returning [] is correct. * can be zero characters.

There is a quote missing ..

thanks!

Seems like there are some outside of BMP issues

Hmm. Yes there are. Bummer. it really means that the implementation likely can't leverage regex

Eswcvlad commented 5 months ago

search("*", "")

I think returning [] is correct. * can be zero characters.

Well, yes, that's why I would expect [0, ""] here :) . Similarly, if find("", "") -> 0, then you would expect search("", "") -> [0, ""] and same for search("*", ""), which is a superset of the previous call.

Agreed on everything else.

JohnBrinkman commented 5 months ago

I would expect [0, ""] ... for search("*", "")

Hmm. Given: =SEARCH("*", "") and =SEARCH("", "") Excel fails to find a match in either case.

I suspect the general rule is that we never find anything in an empty string.

Eswcvlad commented 5 months ago

Oh, it is from OpenFormula. I assumed it was based on glob/regex... There you can find empty string in anything, usually.

JohnBrinkman commented 4 months ago

Cleaning up the remaining items from this list.

Eswcvlad commented 4 months ago

@JohnBrinkman Do we leave search with the empty string case as is? Just want to confirm.

Btw, I've did some checking, and it looks like there are corner case differences in OpenFormula implementations as well. In Excel =SEARCH("", "AB") is 1, but in LibreOffice Calc =SEARCH(""; "AB") is #VALUE!.