digitalbazaar / jsonld.js

A JSON-LD Processor and API implementation in JavaScript
https://json-ld.org/
Other
1.64k stars 195 forks source link

how to handle a legacy context #542

Closed lemoustachiste closed 7 months ago

lemoustachiste commented 7 months ago

Hi,

a while back I had posted about an issue with more recent version of this library (https://github.com/digitalbazaar/jsonld.js/issues/426#issuecomment-879179580) but didn't get an answer.

At the time, I basically froze the support the Blockcerts v1 to a specific library with 0 maintenance planned. Now, this strategy has come to bite me as there are vulnerabilities in the code and I'm trying to address them.

One way to address them is to update jsonld, so I am back at the initial issue.

I'm working with a published context and published certs (I dare not call them VCs, but they are a proto version of that), so I need to ensure those still verify.

One term is defined as follows (this is just an excerpt of the whole context):

      "bc": "https://w3id.org/blockcerts#",
      "image:signature": {
        "@id": "bc:image:signature"
      }

And with the latest version of jsonld.js (but also v5), normalizing the document yields the following error:

Invalid JSON-LD syntax; term in form of IRI must expand to definition.

With version jsonld.js@1.8.1, this how it gets normalized:

<https://w3id.org/blockcerts#image:signature> "" .

But here I don't know how to make a subtle change that would yield the same normalization without impairing the whole verification process.

The question is not in itself related to jsonld.js but more generally to jsonld, and I figure there are a few experts here that could help me.

I have tried to change the @id to https://w3id.org/blockcerts#image:signature, the term definition too (but that breaks because there is no mapping anymore), but I don't know if there would be other options.

Thanks for taking a look

davidlehn commented 7 months ago

Can you provide a complete minimal example with input(s) and desired output? (maybe shorten that image url blob to a simple string) It's easier to think this over when we can use the playground or other tooling with the real problematic data.

I imagine the issue is that the code is being more strict and correct now. But there might be some tricks to get around this specific problem.

I have wondered what we'd do when real data is out there that's invalid but still needs to be processed. There might be some places to add special hooks to handle specific types of legacy issues. Avoiding adding that complexity would be nice though.

lemoustachiste commented 7 months ago

Hi @davidlehn,

The full jsonld context can be found here: https://www.blockcerts.org/schema/1.2/context.json

and this is the document I am trying to normalize: https://github.com/blockchain-certificates/cert-verifier-js/blob/master/test/fixtures/v1/testnet-valid-1.2.json

And this is the desired normalization output as gotten from v1.8.1 and valid in terms of verification (sorry it's a bit long but I'm giving you the whole thing so you can weed out what you want from it - with MerkleProof201x we sha256 that normalization result to verify the content has not been tempered with, so it's really important the output is iso with what was normalized and hashed at issuance time):

<http://certificates.gamoeofthronesxyz.org/b5dee02e-50cd-4e48-ad33-de7d2eafa359> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <https://w3id.org/blockcerts#Assertion> .
      <http://certificates.gamoeofthronesxyz.org/b5dee02e-50cd-4e48-ad33-de7d2eafa359> <https://w3id.org/blockcerts#image:signature> "" .
      <http://certificates.gamoeofthronesxyz.org/b5dee02e-50cd-4e48-ad33-de7d2eafa359> <https://w3id.org/openbadges#issueDate> "2016-10-03"^^<http://www.w3.org/2001/XMLSchema#dateTime> .
      <http://certificates.gamoeofthronesxyz.org/b5dee02e-50cd-4e48-ad33-de7d2eafa359> <https://w3id.org/openbadges#uid> "b5dee02e-50cd-4e48-ad33-de7d2eafa359" .
      <http://certificates.gamoeofthronesxyz.org/criteria/2016/08/got.json> <http://schema.org/description> "This certifies that the named character is an official Game of Thrones character." .
      <http://certificates.gamoeofthronesxyz.org/criteria/2016/08/got.json> <http://schema.org/image> <> .
      <http://certificates.gamoeofthronesxyz.org/criteria/2016/08/got.json> <http://schema.org/name> "Game of Thrones Character" .
      <http://certificates.gamoeofthronesxyz.org/criteria/2016/08/got.json> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <https://w3id.org/blockcerts#Certificate> .
      <http://certificates.gamoeofthronesxyz.org/criteria/2016/08/got.json> <https://w3id.org/blockcerts#issuer> <http://www.blockcerts.org/mockissuer/issuer/got-issuer_live.json> .
      <http://www.blockcerts.org/mockissuer/issuer/got-issuer_live.json> <http://schema.org/email> "fakeEmail@gamoeofthronesxyz.org" .
      <http://www.blockcerts.org/mockissuer/issuer/got-issuer_live.json> <http://schema.org/image> <> .
      <http://www.blockcerts.org/mockissuer/issuer/got-issuer_live.json> <http://schema.org/name> "Game of thrones issuer" .
      <http://www.blockcerts.org/mockissuer/issuer/got-issuer_live.json> <http://schema.org/url> <http://www.blockcerts.org/mockissuer/> .
      <http://www.blockcerts.org/mockissuer/issuer/got-issuer_live.json> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <https://w3id.org/blockcerts#Issuer> .
      _:c14n0 <http://schema.org/familyName> "Targaryen" .
      _:c14n0 <http://schema.org/givenName> "Daenerys" .
      _:c14n0 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://schema.org/email> .
      _:c14n0 <https://w3id.org/blockcerts#publicKey> "18AaFyeWmsasbSh2GsjGTtrNHqiJgsN6nB" .
      _:c14n0 <https://w3id.org/blockcerts#revocationKey> "16wyA4kLFiaQSEE9xZEFTEMXTzWsGf4Zki" .
      _:c14n0 <https://w3id.org/openbadges#hashed> "false"^^<http://www.w3.org/2001/XMLSchema#boolean> .
      _:c14n0 <https://w3id.org/openbadges#identityHash> "daenerysxyz@targaryenxyz.com" .
      _:c14n1 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <https://w3id.org/blockcerts#CertificateDocument> .
      _:c14n1 <https://w3id.org/blockcerts#assertion> <http://certificates.gamoeofthronesxyz.org/b5dee02e-50cd-4e48-ad33-de7d2eafa359> .
      _:c14n1 <https://w3id.org/blockcerts#certificate> <http://certificates.gamoeofthronesxyz.org/criteria/2016/08/got.json> .
      _:c14n1 <https://w3id.org/blockcerts#recipient> _:c14n0 .
      _:c14n1 <https://w3id.org/blockcerts#signature> "H9GWFpixqY5JBS9Y5r45DXHMB4nWNzavS7h7C0whDhdxLshppwW18ZT8zzHq5qEEYGp+DICgU51wrdbSHHlaK5E=" .
      _:c14n1 <https://w3id.org/blockcerts#verify> _:c14n2 .
      _:c14n2 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <https://w3id.org/blockcerts#SignedBadge> .
      _:c14n2 <https://w3id.org/blockcerts#attribute-signed> "uid" .
      _:c14n2 <https://w3id.org/blockcerts#signer> <http://www.blockcerts.org/mockissuer/keys/got_public_key_live.asc> .
davidlehn commented 7 months ago

I'll simplify a bit. Might help onlookers suggest a solution.

The data in question is like this:

{
  "@context": [
    {
      "a": "ex:a#",
      "b:c": {
        "@id": "a:b:c"
      }
    }                                                                            
  ],
  "ex:d": {
    "b:c": "data"
  }
}

Canonizing (or really any context processing) in newer jsonld.js results in:

jsonld.SyntaxError: Invalid JSON-LD syntax; term in form of IRI must expand to definition.

The jsonld.js errors here lack details, but it's due to a check with expected data:

    id: 'ex:a#b_c',
    term: 'b:c',
    termIri: 'b:c'

The RDF Distiller says:

Failed to parse input document: invalid IRI mapping: term b:c expands to ex:a#b:c, not b:c

Adjusted data like this:

{
  "@context": [
    {
      "a": "ex:a#",
      "b_c": {
        "@id": "a:b_c"
      }
    }
  ],
  "ex:d": {
    "b_c": "data"
  }
}

Would result in:

_:c14n0 <ex:d> _:c14n1 .
_:c14n1 <ex:a#b_c> "data" .

And on the 1.0 playground (older jsonld.js), the original input does result in:

_:c14n0 <ex:d> _:c14n1 .
_:c14n1 <ex:a#b:c> "data" .

At first glance, I'm not sure what a great solution is here. I feel we thought about this and had some trick, but I'm not sure what it is?

A first pass might be to "fix" the term with a placeholder in both the context data used and the data processed. Then expanded data would be correct and get canonized correctly (unless I'm missing something).

For jsonld.js, that would look like:

The above is a bit awkward and slow but requires no special library support and would be a solution that could be adapted to other implementations. Maybe some magic hook could work too but I don't know where or what that would look like.

[1] Note that this data has another problem. If you do the above change to test it, then the modern jsonld.js will fail "safe" mode because the "language" term, used in .document.certificate.language, is not defined and dropped (ie, not in the output n-quads). You'd have to explicitly turn safe mode off, and be aware that data is not signed. This is already a problem as can be seen in the example canonized data.

lemoustachiste commented 7 months ago

Thanks @davidlehn for the thorough exploration.

I think I understand what could be the hack in my case. Indeed it's not too great but, at least in my case it is one property and we have a minimal expectation of what it can and should be, I'll see if I can try something out.

If another solution comes up I'm all ears but for now this seems to be the approach. And yes I'll be running in unsafe mode, I think there are too many unknown unknowns in this case and I just want a basic legacy solution for blockcerts v1.

lemoustachiste commented 7 months ago

@davidlehn I was able to implement the fix in the end the data mutation is small and the normalization result is as expected: https://github.com/blockchain-certificates/cert-verifier-js-v1-legacy/blob/master/src/inspectors/computeLocalHash.ts#L27-L48

A bit hacky but at least I was able to update to the latest version of jsonld.js so that's good.

Thanks for your help