Input descriptors: interop with VC types

jmandel commented 4 years ago

If I want to request presentation of VC with a known type (e.g., "https://healthwallet.cards#immunization") or matching all members of a known set of types (e.g., "https://healthwallet.cards#covid19" && "https://healthwallet.cards#immunization"), it's not clear how to accomplish this. Right now I can specify requests based on JSON schema, but it's not always easy (or precise, or useful) to describe a credential type using JSON schema.

Some initial notes...

It's unclear why name, purpose, and metadata are properties of a schema rather than properties of an input descriptor. For example, when I'm asking for a driver's license, the reason ("purpose") isn't specific to the schema, but about the descriptor overall.
It'd be nice if there was a way to map the VC "type" concept into presentation exchange. This could be by simply defining a schema that constrained a VC's type property, if we could inline the schema here. But the fact that schemas can only be defined by URI would mean someone would need to pre-compute all possible combinations of types, assign a URI to each combination, and publish schema content at those URIs. This feels unmanageable.

Proposal:

Move name, purpose, and (I think?) metadata up a level, from input_descriptors[].schema to input_descriptors[].
Make input_descriptors[].schema optional (when not supplied, just consider constraints)
Allow constraints based on JSON Schema's array contains.const logic, so for example a constraints could be written for VC types:

"input_descriptors": [{
  "id": "imm_rec_1",
  "name": "Immunization Record",
  "purpose": "We need your immunization records.",
  "constraints": [{
    "fields": [{
      "path": ["$.vc.type"],
      "filter": {
        "type": "array",
        "contains": {
          "const": "https://healthwallet.cards#immunization" }}}]}]}]

decentralgabe commented 4 years ago

I agree with your point about moving the purpose and name up to under input_descriptors. I also think there's value in having a name property under a schema.
The schema doesn't have to be a JSON schema, it can also be an LD type. I agree this is confusing in the current specification. Types do have to be URIs in VC-land, so I'm not clear on the issue there.

Are you able to join the call this Thursday to discuss?

jmandel commented 4 years ago

Re: "schema vs types", sure -- if it was legal to write:

    "input_descriptors": [{
      "id": "imm_rec_1",
      "schema": {
        "uri": ["https://healthwallet.cards#covid19", "https://healthwallet.cards#immunization"]
      }
    }]

... and have this mean:

imm_rec_1 refers to any claim that "is-a" (in the .vc.type sense) #covid19 credential and also "is-a" #immunization credential,

... then that'd satisfy my use case.

The current docs for the URI property don't seem to allow this (specifically the values in the array don't behave like AND'd requirements, and the spec refers to "credential schemas" where are not (or at least, not obviously!) the same as types:

The object MUST contain a uri property, and its value MUST be an array consisting of one or more valid URI strings for the acceptable credential schemas. A common use of multiple entries in the uri array is when multiple versions of a credential schema exist and there is a desire to express support for more than one version. This field allowing multiple URIs is not intended to be used as a mechanism for including references to fundamentally different schemas, and SHOULD NOT be used by the implementer this way.

I'd join the call if I'm available -- is there a listing of call times / meeting links somewhere?

decentralgabe commented 4 years ago

My understanding is that your example is valid. We can improve the language. I encourage you to make a PR 😄

Are you a member of DIF? If so, the meeting information is available in slack.

csuwildcat commented 4 years ago

Pending close based on adding helpful text about what qualifies as a URI.

jmandel commented 4 years ago

Qualifying URIs alone won't solve this -- as long as the semantics for a URI array don't change. In other words, if you provide an array of multiple URIs, is it "or" or "and"? I need a way to express "and" (intersection) logic, to ask for credentials that match two types/shapes/schema.

csuwildcat commented 4 years ago

@jmandel yeah, after our talk, I understand your use case better. I think we can address this, and will bring it up with the WG folks next meeting.

decentralgabe commented 4 years ago

@jmandel would this kind of language fit your needs?


{
        "id": "imm_rec_1",
        "schema": {
          "identifiers": [
             {
              "uri": "https://healthwallet.cards#covid19",
              "required": true
             },
             {
              "uri": "https://healthwallet.cards#immunization",
              "required": true
             }
          ],
          "name": "Bank Account Information",
          "purpose": "We need your bank and account information."
        },
      }

jmandel commented 4 years ago

Yes, if this translates to "imm_rec_1 refers to any VCs that are simultaneously typed as #covid19 and #immunization" ?

decentralgabe commented 4 years ago

correct, that input descriptor would specify both identifiers are required. required would be a new optional property as a way to handle AND cases

NickDarvey commented 4 years ago

I think I'm trying to get my head around the same thing.

The schema doesn't have to be a JSON schema, it can also be an LD type.

Does this mean to request a credential looking like this one in the test vectors, I should define my Input Descriptors like this?

{
  "id": "degree",
  "schema": {
    "identifiers": [
      {
        "uri": "https://www.w3.org/2018/credentials/v1#VerifiableCredential",
        "required": true
      },
      {
        "uri": "https://example.org/examples#UniversityDegreeCredential",
        "required": true
      }
    ]
  }
}

And if the schema can be many things, how should the User Agent determine what kind of schema validation it should be doing (JSON Schema vs LD type)? Fetching and checking what the document looks like?

I might be introducing an XY problem here... so to be specific about what I'm trying to achieve. I want to request an address file in an Australian format (G-NAF) to be presented. (I see there's some different views which ought to be useful.) I was imagining I would declare a Verifiable Credential type that refers to the G-NAF schema but adds some meaning to the address (e.g. 'primary residence', pretty much as the implementation guide describes), but I wasn't quite sure how to then define a Presentation Definition that requests it.

Edit: And if I didn't particularly want this to be a credential, I just wanted something of a G-NAF shape presented, could I just reference the G-NAF schema directly in my presentation definition? i.e. http://linked.data.gov.au/def/gnaf/1.1 (though I don't think I can use that IRI directly for schema validation...)

decentralgabe commented 4 years ago

@NickDarvey

And if the schema can be many things, how should the User Agent determine what kind of schema validation it should be doing (JSON Schema vs LD type)? Fetching and checking what the document looks like?

Yes. We can consider adding a type property if it becomes an issue.

just wanted something of a G-NAF shape presented, could I just reference the G-NAF schema directly in my presentation definition

Yes, the schema can be for anything.

decentralgabe commented 4 years ago

Please review https://github.com/decentralized-identity/presentation-exchange/pull/149

csuwildcat commented 4 years ago

@jmandel would this also work for you, Josh?:

{
  "id": "immunity",
  "schema": {
    "uri": ["https://healthwallet.cards"],
    "type": ["immunization", "covid19"]
   }
}

jmandel commented 4 years ago

What's the intended meaning of this example? I can't tell from the JSON alone...

Are you trying to decompose a URI into two parts (prefix + suffix) here? Or is the URI array interpreted totally independently from the types array?

I'd be fine if the following is legal (and has the same meaning I outlined at https://github.com/decentralized-identity/presentation-exchange/issues/134#issuecomment-718977914)

{
  "id": "immunity",
  "schema": {
    "type": ["https://healthwallet.cards#immunization", "https://healthwallet.cards#covid19"]
   }
}

... but if the concatenation of URIs ++ types is supposed to be implicit, this causes a problem, because there's an array of URIs and an array of types, and you wouldn't know which ones to append to which.

decentralgabe commented 4 years ago

Opt 1

    "input_descriptors": [{
      "id": "name_input",
      "name": "Full Legal Name",
      "purpose": "We need your full legal name.",
      "schema": [
        {
          "uri": "https://name-standards.com/name.json",
          "type": "Name",
          "required": true
        },
        {
          "uri": "https://name-standards.com/australianName.json",
          "type": "AusName"
        }
      ]
    }]

Opt 2

    "input_descriptors": [{
      "id": "name_input",
      "name": "Full Legal Name",
      "purpose": "We need your full legal name.",
      "schema": [
        {
          "uri": ["https://name-standards.com/name.json#Name", "https://name-standards.com/australianName.json#AusName"],
          "required": true
        }
      ]
    }]

decentralgabe commented 4 years ago

tl;dr we don't understand LD enough to pick an option. @csuwildcat @OR13 plz provide guidance

jmandel commented 4 years ago

I'm assuming the only thing we'd be standardizing is the "shape" of these objects, and not any semantics in terms of how the URI strings themselves are constructed or what they represent. As such:

it's a bit confusing that the two options use different URI strings
it's also confusing that the two options seem to say different things (first: we need your name, which could be an Australian name; second: we need an instance of your name and it must also be an australian name)

Overall, I see no reason why this use case requires an array of arrays for a single input descriptor; as such, option 1 seems preferable.

Where/how all this fits into a Linked Data world also needs to be written down somewhere. I'd suggest semantics like:

A mobile wallet associates a set of types with each VC. Types are expressed as URIs, and generally the list of types for a VC would be populated by canonicalizing all values from the type field of the VC, according to the VC Data Model (but possibly additional types could be inferred in other ways, too -- that's out of scope)
An input descriptor can request data that matches one or more types via its .schema.uri properties; to successfully count as a "match" for an input descriptor, a single VC must match all of the "required": true types specified in the input descriptor, by canonical URI comparison.

csuwildcat commented 4 years ago

@jmandel it's this part I am trying to tease out to make the best decision: "to successfully count as a "match" for an input descriptor, a single VC must match all of the "required": true types specified in the input descriptor, by canonical URI comparison."

The issue I believe that exists, and argues in favor of Option 1 above, is that given the format of the credential held within a wallet, the URI/Type strings may not be present inside the credential object as a single concatenated string, thus not breaking those up introduces the question of how one decomposes the combined strings to compare against values that are separate in the credential objects themselves. I don't believe there exists a format-universal semantic for representing all base URI + type strings concatenated together (as your example does with a hash symbol), so do you think it would be safer and more explicit to separate them as it is in Option 1?

jmandel commented 4 years ago

Type strings may not be present inside the credential object as a single concatenated string

Canonical URIs are the best way I know to represent linked data concepts in an unambiguous way. Sometimes it's convenient to split them into prefixes (aka namespaces) and postfixes, but that's a convenience for human readers. After all, you can also break them into three parts (say, an issuing organization, a topic area, and a specific value). Or, heck, the Linnaean taxonomy uses seven parts (in its most basic form) to identify a species. When we use Linked Data to refer to a Short-beaked common dolphin, we don't expect systems to communicate, process, and compare nine distinct properties to define a full classification:

Instead, we communicate via URI in a pre-coordinated code system, (e.g., we use a value like http://taxref.mnhn.fr/lod/taxon/60878/13.0 from taxref-ld -- a project I found with a web search just now, so I'm not endorsing any of the particulars ;-))

Sorry this got kind of preachy/random; bottom line: I think URIs are a good way to represent types. I'd rewrite "option 1" to create "option 3" as:

"input_descriptors": [{
  "id": "name_input",
  "name": "Full Legal Name",
  "purpose": "We need your full legal name.",
  "schema": [
    {
      "uri": "https://vocab.example.org/credential-type-a",
      "required": true
    },
    {
      "uri": "https://vocab.example.org/credential-type-b",
      "required": true
    }
  ]
}]

decentralgabe commented 4 years ago

My latest commit in #149 represents option 3. Please verify its accuracy

jmandel commented 4 years ago

Thanks! This is looking great. One remaining question has to do with required:

In the PR you have:

The object MAY contain a boolean required property, and if present it signifies that the given schema object is required to fulfill the given [[ref:Submission Requirement]].

Nitpick: "if present it signifies that the given schema object is required" should say "if present and true" :/
More substantially: how should schema items be interpreted when required is absent? If it's like required: false, then what does it mean that I'm even listing this schema? In other words, for most of the examples I think required should be present and true. There may be some edge cases where a schema entry is just... a hint, but I wouldn't expect that to be the case for, e.g.,

decentralgabe commented 3 years ago

Thanks for the nit, I cleaned up the language.

For the second -- I view it as if something is true it must match at least that. If multiple are true it must match all. If one is true and one is not it must match the true and may match some language in the not true.

decentralgabe commented 3 years ago

resolved in #149

decentralized-identity / presentation-exchange