decentralized-identity / presentation-exchange

Specification that codifies an inter-related pair of data formats for defining proof presentations (Presentation Definition) and subsequent proof submissions (Presentation Submission)
https://identity.foundation/presentation-exchange
Apache License 2.0
82 stars 37 forks source link

Clarify filtering examples #420

Closed jmandel closed 1 year ago

jmandel commented 1 year ago

Section 5 introduces filtering examples.

The "Filter by Credential Type" eexample includes:

"path": [
                "$.type"
              ],
              "filter": {
                "type": "string",
                "pattern": "<the type of VC e.g. degree certificate>"
              }

Typically a verifiable credential will have an array of types (starting with the base type, https://www.w3.org/2018/credentials#VerifiableCredential). But this filter is expecting the type property to evaluate to a single string, if i am reading the processing algorithm correctly. Does the example need to be rewritten as a filter with "type": "array", "contains": { ... to handle the typical case?

More generally... many fields in the VC data model can be represented as single values or arrays. To make filter logic that is robust, I suppose it becomes necessary to write filters that can match single values or arrays.

csuwildcat commented 1 year ago

The filter object is quite literally any valid JSON Schema object, so you should be able to write a JSON Schema definition that tests values just about any way one can imagine, I'd think. In JSON Schema you can test arrays to make sure they contain certain elements, are of certain data types, match against a regexp, etc., so I believe your case is covered, but I'm curious what your exact test is, because I could probably write up a quick JSON Schema snippet to do it.

csuwildcat commented 1 year ago

Here's a JSON Schema that tests all items in an array for adherence to a simple restriction that all members of the array must be strings that are either "foo" or "bar":

{
  "type": "array",
  "items": {
    "type": "string",
    "enum": [ "foo", "bar" ]
  }
}

Considering how expressive JSON Schema is, I can't imagine there's a check on a value that can't be accomplished with it.

jmandel commented 1 year ago

Thanks for the quick response!

curious what your exact test is

Let's say I have a VC like Example 1 from the VC Data Model specification:

{

  "@context": [
    "https://www.w3.org/2018/credentials/v1", "https://www.w3.org/2018/credentials/examples/v1"
  ],
  "id": "http://example.edu/credentials/1872",
  "type": ["VerifiableCredential", "AlumniCredential"],

Developers looking at the DIF "filter by type" example would try to write an expression like:

"path": [
                "$.type"
              ],
              "filter": {
                "type": "string",
                "pattern": "AlumniCredential"
              }

But they will be surprised and disappointed when the expression fails to match real VCs (i.e., the filter assumes a scalar type when in all cases I am aware of, the type will be an array). So the fix here is to make sure the DIF "filter by type" example expects arrays. (It still isn't quite right because it implicitly depends on a certain @context being in place, but that feels like a harder issue to fix, and points to an impedance mismatch between this filtering approach and the use of JSON-LD -- but that deserves a separate issue.)

The slightly broader question is: Can we provide examples in the specification that are robust to the very common scenario where you don't know ahead of time whether you're going to get a scalar or an array. Because that complexity seems to come with the territory of the VC data model. In particular, the "Two Filters" example that currently looks for terms of use based on two properties could use this kind of treatment, because terms of use can be either a scalar or an array and the current example will only find values that happen to be scalars.

(I understand the benefit of creating very simple examples to help developers figure out what's going on, but misleading examples can cause enduring confusion.)

csuwildcat commented 1 year ago

Yeah, I guess we put that simple example in to not overload people, but you're right that a more precise example of the most common case is probably needed. Would you be interested in doing a PR to modify one to be more representative of the case you outlined?

kimdhamilton commented 1 year ago

This issue addressed 2 ways: