JSONPath limited to non-array types

gitblit commented 3 weeks ago

I know this is a documented limitation, but it would be nice to test array types and not take the first element.

Ideally, this zzapi test would be written as:

$.data.phoneNumbers[*].type: [ mobile, home ]

vasan-agrostar commented 3 weeks ago

We had discussed this and for some reason keeping the array as is interfered with something. We must think and recollect whether other changes have overcome that original interference.

Varun0157 commented 3 weeks ago

If I recall: the issue was primarily that jp.query returns an array of matching results, and if there are multiple matches, we would not know which one to run all of the tests against (unless we run them against all, in which case some are bound to fail). Thus, we stuck to jp.value, which returns the first match, and said that any other ones have to be explicitly stated.

We could assert against all matches in cases like the above where there are no object tests, but that would mean treating this one type of test differently from the rest.

gitblit commented 2 weeks ago

if there are multiple matches, we would not know which one to run

Shouldn't this be one equality test against the array? Expected array === actual array.

As a user, I would also expect this to work, but it is consistent with the first example and only takes the first array element.

$.data.phoneNumbers[*].type: { $eq: [ mobile, home ] }

Varun0157 commented 2 weeks ago

Ah yes, I agree. Another possible reason might have been that the matches would be non-intuitive in certain cases, such as cases where there is only a single match. If we always ask the user for a list of matches, then cases such as $.data.address.streetAddress would require the user to specify ["naist street"] instead of just "naist street", for example.

I'm fairly certain this isn't the reason we decided to adopted jp.value in the first place though, I'll have to try to recall the exact reason for that. If we can remember that then we can conclusively move on to jp.query.

gitblit commented 2 weeks ago

Ah. That seems to be the case. I agree wrapping the value to make it array is awful.

How about if the expected value is a scalar, then this requires the jp.query result to have a single element.

Varun0157 commented 2 weeks ago

@vasan-agrostar thoughts? This makes sense to me.

vasan-agrostar commented 2 weeks ago

Yeah, makes sense. But the expected value has to be a non-array (scalar, or an object) for us to use the first value. If the expected value is an array, then we keep the array returned from JSONpath. @Varun0157 can you look through the code and see if this will cover everything? We may also need to check if the returned array has more than one element and treat it as an array else pick the first/only element.

Varun0157 commented 2 weeks ago

From a code perspective, it would work - but I think this change could still cause some confusion to the user, right?

I see two options based on the discussion above:

If the expected value is a scalar, then there should be a single element received (we check for this), and we do a direct comparison. The potential confusion I see with this is that objects (for example) would still need to be wrapped in an array. So if I'm only expecting { a : b }, I still have to write it as [ { a : b } ]. Even if we extend this from scalars to non-arrays, the same issue arises: if I'm expecting [a, b], I would have to write it as [[a, b]], but would not have to do this for any other case.
If the result of jp.query has a single element (i.e., the received value), then the expected value should not be wrapped in an array. This makes more sense to me, where if there is a single result we can write it on it's own, else, we need to write them as an array, but I'm not sure if that's less intuitive.

Are these valid options, and are there others? If not, which of the above is better?

gitblit commented 2 weeks ago

Objects

I'm confused by your point 1 on object checks. If we are expecting an object and jp.query returns an array, wouldn't we be comparing the expected object to the first element of the array just like the scalar value approach?

Arrays

I think we can avoid the [][] issue with a proper JSONpath expression, we just need to clearly document that tests are always written using JSONpath expressions.

`$.phoneNumbers`	`$.phoneNumbers[:]` or `$.phoneNumbers[*]`

vasan-agrostar commented 2 weeks ago

Different options

Essentially, the decision to be made is this: when to use the entire array result of jp.query and when to pick the first element. Here are the options I can think of:

Implictly understand what to do a. Based on expected value (RHS) type b. Based on the number of elements in the return of jp.query c. Based on both d. Based on either
Let the user explicitly specify how to match b. Differentiate based on { $eq: [] } vs plain [] in the RHS a. Add a pseudo operator $multi

Document example

[
    {
        "name": "John",
        "phoneNumbers": [
            {
                "type": "iPhone",
                "available": [7,22]
            },
            {
                "type": "home",
                "available": [21]
            }
        ]
    },
    {
        "name": "Mary",
        "phoneNumbers": [
            {
                "type": "Android",
                "available": [7,22]
            }
        ]
    }
]

1a. Implict, based on RHS type

Rule: if the RHS is an array, use the jp.query array as is, else pick the first element to compare with.

  $.[?(@.name=="John")].phoneNumbers.0.type:         { $eq: home, $type: string }
  $.[?(@.name=="John")].phoneNumbers.0.available:    { $eq: [7, 22], $type: array } # FAIL
  $.[?(@.name=="John")].phoneNumbers.0.available:    { $eq: [[7, 22]], $type: array } # Pass 1
  $.[?(@.name=="John")].phoneNumbers.0.available[*]: { $eq: [7, 22], $type: array } # Pass 2

The second test is the natural way to write it. But it will fail because jp.query will return [[7,22]]. You will have to use the third or fourth forms.

1b. Implicit, based on `jp.query` result

Rule: If the result of jp.query has more than one item, use the entire array to compare with, else, use the first and only element to compare with.

  $.[?(@.name=="John")].phoneNumbers[*].type: { $eq: [ iPhone, home ] }
  $.[?(@.name=="Mary")].phoneNumbers[*].type: { $eq: [ Android ] } # FAIL
  $.[?(@.name=="Mary")].phoneNumbers[*].type: { $eq: Android } # Pass

The second test is the natural way to write it, but because since the result of jp.query has only one element, we will use that for comparison. You have to remember to use the third form, depending on the number of elements you are expecting.

1c. Both the above

Rule: Both conditions have to be satisfied. RHS has to be an array, plus, jp.query should return multiple values.

Result is same as option 1b. The second test will fail.

1b. One of the above

Rule: either the RHS is an array OR jp.query returns multiple values.

The result is the same as option 1a. The second test will fail.

2a. Explicit based on spec

We don't allow this form:

  $.something: [a, b, c]

That's because we insist on a scalar at the RHS. Otherwise, it is interpreted as an object with multiple assertions. We didn't think through an array. But it is OK to use this form, and based on this, we can differentiate between how to use the jp.query result.

$.something: [ a, b, c ] can compare the jp.query result as is. Whereas $.something: { $eq: [ a, b, c ]} can compare the first element (and we expect only one element).

But then, you'd like to think $.something: X is the same as $something: {$eq: X, ... } with some more assertions/options. Making them behave differently breaks the intuitiveness.

2b. Explicit with an option a-la `skip`

  $.something: { $eq: [ a, b, c] } # test jp.query()[0] === [ a, b, c ]
  $.something: { $eq: [ a, b, c], multi: true } # test jp.query() === [ a, b, c ]

We can come up with better names for multi. Maybe all or pattern or query.

Evaluation

I think being explicit makes sense. Further, it gives us an opportunity to fix false positives (undetected failures) which are far worse than false negatives. In most of the other options, there is a possibility of a false positive. The API response can be in a form that matches the expectation, especially when we rely on the results to tell us what to do.

Few more things:

Our current implementation can have false positives. The output can have multiple elements that match the query, but we compare against only one. It is an improvement to explictly expect multiple matches vs. a single match. There is no way to assert that the given path should only have one match. Option 2b gives us the opportunity to fix that. If jp.query is not expected to return multiple values and it does, we can make it fail.
We can think of allowing $.something: [ a, b, c ] rather than force the user to use the $eq operator. We cannot allow $.something: { type: 'Android' } etc, because when we detect an object in the RHS, we interpret it as a spec that has operators like $gt. But an array can surely be allowed.
We should probably remove the $ in the $skip option. In $.something: { $sw: Y, $gt: Z } all operators start with a $. The other stuff like multi and skip are options, not operators. They don't need to start with a $. We really don't need the $ for operators either, but we kept it to match the familiarity with MongoDB filter operators.

vasan-agrostar commented 2 weeks ago

Ah, and one more thing. We should potentially be able to check the type of each nested sub-element in an array.

  $.[*].phoneNumbers[*].number: { multi: true, $type: string }

gitblit commented 2 weeks ago

🤪 wow. Thank you for putting all that thought into the scenarios. It's challenging to keep track of the permutations.

Sidebar: I don't understand the implications of multi, but I do think asserting the type of an array element is interesting, but perhaps that could be handled without overloading type.

Here are some thoughts:

Clearly document that ZZApi is a JSONPath testing toolkit and typical expression results are expected. Convenience syntax/shortcuts are provided for scalar value assertions.
Always use jp.query
If RHS is scalar, assert jp.query.length === 1, take first element & compare
If RHS is array, assert jp.query deepEquals RHS as-is. Require the user to specify the appropriate JSONPath expression and report actual/jp.query accurately, on failure, so they can correct their expression. (Supported by point 1) Maybe we can detect this and show URL on the console to point them to the appropriate documentation.
If RHS is an object, the assertions may be based on an expected scalar or array. We need to understand the object and choose point 3 or point 4. The object may not specify $eq. It may specify $type or $size so there is some sort of decision tree to determine which evaluation path to take.

In support of these changes, assuming some proposal moves forward, can we leverage Jest to validate whatever implementation is selected? agrostar/zzapi-vscode#18 is a good example of something that would have been caught with some test automation and these changes will be difficult to validate if the project keeps moving ahead with manual QA.

vasan-agrostar commented 2 weeks ago

I am not aligned with this, which is essentially my option 1a. There are no ambiguities here I agree. But:

The mental model I expect users to have is that the path results in a single value. That value can be an object, an array, or a scalar, expected to be treated uniformly. And this is what most people will be asserting against. The fact that JSONPath returns an array is not intuitive to me. But this is the best library we found. Forcing users to understand this vagary about JSONPath is an unnecessary burden except for the advanced or expert users.
I expect the multiple-value comparison to be a rare use case. I would rather make rarer use cases harder to specify and common use cases simpler.
I have test cases in my repo where the RHS is an array, a single return value from JSONPath query. Just like the available: { $eq: [7,21] } case. Now, if we implement this change, it will break my tests. And others' tests too, if they have used it like this. We will have to rewrite them like { $eq: [[7,21]] } which is quite unintuitive to me.
Saying what is expected explicitly is simpler to understand, though it may be harder to specify in some cases.

So I am proposing:

By default, we expect the jp.query result to be a single value. If it returns multiple values, the assertion will fail (with a helpful hint that you should use the multi option, if that is what you want).
If the expected result of jp.query is an array of multiple matches to the expression, you will have to explicitly say so, eg, $.data.phoneNumbers[*].type: { $eq: [ mobile, home ], multi: true }
And in this case, the entire result will be passed as is to the $tests operator, if any. So you could write:

$.data.phoneNumbers[*].type:
  multi: true
  $tests:
    $.0: mobile
    $.1: home

Further, unrelated to this change we will also do the following:

Allow the RHS to be an array without needing to put it under { $eq: ...}, like this: available: [7,21]
Deprecate $skip and use skip. (All options are without the $, all operators are with the $`)

vasan-agrostar commented 2 weeks ago

In support of these changes, assuming some proposal moves forward, can we leverage Jest to validate whatever implementation is selected? https://github.com/agrostar/zzapi-vscode/issues/18 is a good example of something that would have been caught with some test automation and these changes will be difficult to validate if the project keeps moving ahead with manual QA.

Yes, we do have automated tests in zzapi-cli. I am not sure why it didn't get caught.

I will fork this off as a new issue.

Varun0157 commented 2 weeks ago

In support of these changes, assuming some proposal moves forward, can we leverage Jest to validate whatever implementation is selected? https://github.com/agrostar/zzapi-vscode/issues/18 is a good example of something that would have been caught with some test automation and these changes will be difficult to validate if the project keeps moving ahead with manual QA.

Yes, we do have automated tests in zzapi-cli. I am not sure why it didn't get caught.

I will fork this off as a new issue.

The tests aren't extensive, I just wrote something introductory so when we planned on a testing method we could add it with ease.

All it does is assert that the expected number of tests are failing in tests-bundle.zzb, and then assert that the expected tests are failing among this number. The above case didn't change which tests were succeeding / failing so it didn't catch that.

I'll take some time out to write tests with better coverage for each of the cli, extension and node module soon.

gitblit commented 1 week ago

I have test cases in my repo where the RHS is an array, a single return value from JSONPath query. Just like the available: { $eq: [7,21] } case. Now, if we implement this change, it will break my tests. And others' tests too, if they have used it like this. We will have to rewrite them like { $eq: [[7,21]] } which is quite unintuitive to me.

Tests like this one (actual result is [ [7,22], [18,22] ]) will break no matter which implementation is chosen.

I expect the multiple-value comparison to be a rare use case. I would rather make rarer use cases harder to specify and common use cases simpler.

I generally agree with this take & your proposal satisfies my needs.

agrostar / zzapi-vscode