edi3 / edi3-json-ld-ndr

GNU General Public License v3.0
0 stars 2 forks source link

Leverage Verifiable Credentials data model for Certificate of origin #6

Open Fak3 opened 3 years ago

Fak3 commented 3 years ago

This issue was created to discuss adopting Verifiable Credentials data model to represent Certificate of origin as json(+LD) in requests\responses.

VC data model defines json(+LD) structure and vocabulary of properties used to express credentials as a set of claims with proofs which is cryptographically secure and machine-verifiable. Certificate of origin is a good candidate to use the VC data model.

Pros:

Cons:

1. Top level structure

There are several requirements for a top-level json structure to conform to the VC data model:

1.1 The set of claims (which is being verified) should be json object (or list of objects) placed in the value of credentialSubject property at the top level. This set of claims is usually digitally signed (see proof below) so it must be static. { "credentialSubject": {...} }

1.2 Digital signature or blockchain-linked proof can be embedded as a value of the proof property at the top level. The signature can include a link to the verification method used to verify that the set of claims given in the credentialSubject property was signed by an authority (issuer). { "proof": {...} }

1.3 Issuer info must be placed in the value of issuer property at the top level. { "issuer": {...} }

The dynamic data about the consignment or the certificate itself which does not have to be signed as a credential, could be embedded at the top level of the document:

{
  "type": ["VerifiableCredential", "CertificateOfOrigin"],
  "credentialSubject": {...},
  "issuer": {...},
  "proof": {...},
  "mainCarriageTransportMovement": {
    "id": "http://maersk.com/transportmovements/b351-6h47677-g61a",
    "@type": "TransportMovement",
    "departureEvent": {
      "departureDateTime": "2020-07-06T22:53:01.608Z"
    },
    "usedTransportMeans": {
      "id": "id:string",
      "name": "string"
    }
  }
}

Placing such data outside of credentialSubject means that its contents integrity cannot be verified with the proof (provided at the top-level). If there is a desire to make a provable association of the static credential data (consignment) with some dynamic data (for ex transport movement), then the dynamic data can be linked from within the credentialSubject json object by its url:

{
  "credentialSubject": {
    "type": "Consignment",
    "mainCarriageTransportMovement": "http://maersk.com/transportmovements/b351-6h47677-g61a",
    ...
  },
  "mainCarriageTransportMovement": {
    "id": "http://maersk.com/transportmovements/b351-6h47677-g61a",
    "@type": "TransportMovement",
    ...
  }
}

Example certificate of origin that conforms to the VC data model and the embedded json-ld context in json-ld playgroud: https://tinyurl.com/y4vkjf4j

nissimsan commented 3 years ago

@Fak3 , @onthebreeze , loving this!

Clearly, though, this extends generally to any claims, not just Certificate of Origin. I'm currently working on applying this to BOLs.

Imposes several requirements on the json structure (see below)

Right, although the schema/openApi spec doesn't actually have to change. This can be added just at instances. I found this recently which seems relevant: https://w3c-ccg.github.io/vc-json-schemas/ Seems like a way to specify VC not just at instance, but also schema-level.

onthebreeze commented 3 years ago

There's a few interesting issues that this work surfaces:

onthebreeze commented 3 years ago

and there's another issue.

The W3C VC data model has a few reserved property names like "issuer". It's not impossible that a credential data model such as the CoO, which itself must comply with another controlled vocabulary, also has the same property name "issuer". in this case there are some options

Fak3 commented 3 years ago

which itself must comply with another controlled vocabulary

What do you mean by the word "comply"?

If both fields carry exact the same semantics, then i think they can be made equivalent on the conceptual level by using owl:equivalentProperty, but I don't have much knowledge about the practical nuances and adoption of it by the semantic reasoners

"comply" means whatever the conformance criteria MAY/SHOULD/MUST etc) in the relevant standards specification say.

We have one governance group (UN/CEFACT) defines a controlled vocabulary for something like the CoO data model (the "spec is the BRS and JSON schema) - irrespective of whether it is used in a VC or other protocol. Another group (W3C) defines the vocabulary for the wrapper (the VC as per W3C spec). as users / implementers, we don't control either but need to comply with both - therefore we cant make arbitrary changes when there is a collision. So we have to design the binding model (ie how you embed a document in a VC) so that there cant be collisions.

Fak3 commented 3 years ago

is there a use case for selective obfuscation? Eg the VC represents a BoL and the whole BoL is verifiable, but the VC hiades some commercially sensitive data - whis would need property level hashes - also a bit like singapore tradetrust.

There is https://w3c-ccg.github.io/ldp-bbs2020/ that is supposed to enable vc holder to selectively disclose individual fields in the credential, but the spec is fairly incomplete at the moment

nissimsan commented 3 years ago

and there's another issue.

The W3C VC data model has a few reserved property names like "issuer". It's not impossible that a credential data model such as the CoO, which itself must comply with another controlled vocabulary, also has the same property name "issuer". in this case there are some options

* as part of representing the CoO as a VC, the implementer has to comb through the data model of the document to look for duplicates and remove them - so take "issuer" out of the CoO model and use only the VC version.  this feels like a bad idea to me.  means that the payload no longer conforms to the document schema (which itself is independent of VC) and imposes a challenge for the implementer.

* the two vocabularies are in different namespaces so there's no overalp, even if they have the same name (right?).  When implementing a CoO or BoL or Invoice or whatever as a VC, then both fields need to be populated - sometimes with the same value.  This is not so different to traditional B2B messaging where a SOAP header around an invoice might have a "to" party that has the same value as the "buyer" party in the invoice.  no big deal.

@onthebreeze, your second option here I believe is clearly the answer to this. JSON-LD assumes the namespace in the order they are listed. But overruling that is as easy as specifying vc:issuer and edi3:issuer. Tools should do this automatically.

edit: Here's the spec for what I said above: https://w3c.github.io/json-ld-syntax/#advanced-context-usage

Duplicate context terms are overridden using a most-recently-defined-wins mechanism.

nissimsan commented 3 years ago

There's a few interesting issues that this work surfaces:

* which parts of the potentially huge collection of data elements around a consignment are static / verifiable and irrevocably inked to the proof - and which parts are dynamic related and un-provable data?  Potentially most supply chain "documents" such as certificates, permits, declarations, invoices, billsOfLading are all verifiable credentials.  As roman points out, the subject can contain links to the more dynamic data - but what is behind the linked URL can change so is not verifiable.

I've had this discussion with Roman also. As a starting point, I would generally agree with you that the whole BOL would be included within the credentialSubject, and thus signed content. I suppose this discussion is similar to the old school discussion of data added on the paper-BOL above and below the signature. That does make a legal difference, and perhaps this feature of the VC data model may come in handy. But again, as a starting point I shove it all into the credentialSubject.

* but is the above is true (or even if a much smaller subset is verifiable) then how to manage data confidentiality?  The data in an invoice is commercially sensitive.  One solution of course is to make sure the holder of the VC only presents the VC to those parties that should see it.  So it doesn't live in a public register, it lives in a private repository / wallet under the control holder (who is often not the same as the issuer).

I have heard the term "micro credential" used. It would absolutely make sense, especially in the BOL case - it's such a messy bunch of data all bundled together. For example freight charges is not data which can be broadly shared, so it would for sure make sense to break it up. But this is a separate problem to tackle. People have gotten used to a certain set of data over the last 100+ years; changing that is a really big deal. And it should stand in the way of what we are aiming for here - it can always be sliced up later.

* or should the subject just contain simple metadata like who the credential is issued to (eg the exporter ID in the case of a certificate of origin), the credential type (eg CoO), and a hash - or maybe DID - of the actual document. This would be a bit like the Singapore tradetust idea where the ID of the thing is also a hash of the thing and also contains a secret key to decode the encrypted thing.

The credential, again, is everything within the credentialSubject. Thus, if that includes:

    "consignee": {
      "id": "did:v1:exportingCo:4bdc45e2-dbce-11ea-87d0-0242ac854126"
    }

then that DID-entity can prove to anyone he is the consignee.

* is there a use case for selective obfuscation?  Eg the VC represents a BoL and the whole BoL is verifiable, but the VC hiades some commercially sensitive data - whis would need property level hashes - also a bit like singapore tradetrust.