digitalbazaar / cborld

A Javascript CBOR-LD processor for web browsers and Node.js apps.
https://json-ld.github.io/cbor-ld-spec/
BSD 3-Clause "New" or "Revised" License
21 stars 17 forks source link

Interoperability issue with vanilla cbor for base58didurl codec #32

Closed OR13 closed 3 years ago

OR13 commented 3 years ago

First, let me say that I am beyond excited for CBOR-LD, and this bug that I am hunting is almost certainly related to my relative inexperience combined with my excitement.

{
  "@context": [
    "https://www.w3.org/2018/credentials/v1",
    "https://w3id.org/vaccination/v1"
  ],
  "type": [
    "VerifiableCredential",
    "VaccinationCertificate"
  ],
  "id": "urn:uvci:af5vshde843jf831j128fj",
  "name": "COVID-19 Vaccination Certificate",
  "description": "COVID-19 Vaccination Certificate",
  "issuanceDate": "2019-12-03T12:19:52Z",
  "expirationDate": "2029-12-03T12:19:52Z",
  "issuer": "did:key:z6MkiY62766b1LJkExWMsM3QG4WtX7QpY823dxoYzr9qZvJ3",
  "credentialSubject": {
    "type": "VaccinationEvent",
    "batchNumber": "1183738569",
    "administeringCentre": "MoH",
    "healthProfessional": "MoH",
    "countryOfVaccination": "NZ",
    "recipient": {
      "type": "VaccineRecipient",
      "givenName": "JOHN",
      "familyName": "SMITH",
      "gender": "Male",
      "birthDate": "1958-07-17"
    },
    "vaccine": {
      "type": "Vaccine",
      "disease": "COVID-19",
      "atcCode": "J07BX03",
      "medicinalProductName": "COVID-19 Vaccine Moderna",
      "marketingAuthorizationHolder": "Moderna Biotech"
    }
  },
  "proof": {
    "type": "Ed25519Signature2018",
    "created": "2021-02-18T23:00:15Z",
    "jws": "eyJhbGciOiJFZERTQSIsImI2NCI6ZmFsc2UsImNyaXQiOlsiYjY0Il19..vD_vXJCWdeGpN-qKHDIlzgGC0auRPcwp3O1sOI-gN8z3UD4pI0HO_77ob5KHhhU1ugLrrwrMsKv71mqHBn-dBg",
    "proofPurpose": "assertionMethod",
    "verificationMethod": "did:key:z6MkiY62766b1LJkExWMsM3QG4WtX7QpY823dxoYzr9qZvJ3#z6MkiY62766b1LJkExWMsM3QG4WtX7QpY823dxoYzr9qZvJ3"
  }
}

in hex, all instances of the encoded “Base58DidURL” are prefixed with d840 in my library but not in yours….

as far as I can tell this happens during cbor encoding, not during encoding mapping stage, since the encoded values, are the same in both libraries…

for example:

issuer: did:key:z6MkiY62766b1LJkExWMsM3QG4WtX7QpY823dxoYzr9qZvJ3
proof.verificationMethod did:key:z6MkiY62766b1LJkExWMsM3QG4WtX7QpY823dxoYzr9qZvJ3#z6MkiY62766b1LJkExWMsM3QG4WtX7QpY823dxoYzr9qZvJ3

I see the same encoded representation before the cbor encoding runs:

[
  1025,
  Uint8Array(34) [
    237,   1,  60, 171,  80, 213,  89, 231,
    138,  27, 171, 113, 132,  31, 217, 203,
    217, 232,   2, 153, 144,  56, 104,  94,
    248,  75, 126,  73,  13,  39,  23, 194,
    177,  16
  ]
]
[
  1025,
  Uint8Array(34) [
    237,   1,  60, 171,  80, 213,  89, 231,
    138,  27, 171, 113, 132,  31, 217, 203,
    217, 232,   2, 153, 144,  56, 104,  94,
    248,  75, 126,  73,  13,  39,  23, 194,
    177,  16
  ],
  Uint8Array(34) [
    237,   1,  60, 171,  80, 213,  89, 231,
    138,  27, 171, 113, 132,  31, 217, 203,
    217, 232,   2, 153, 144,  56, 104,  94,
    248,  75, 126,  73,  13,  39,  23, 194,
    177,  16
  ]
]

I worry that these changes may be the reason for this inconsistency:

https://github.com/digitalbazaar/cbor/blob/main/CHANGELOG.md

dlongley commented 3 years ago

@OR13, it sounds like your encoder may be injecting a byte order mark (BOM) before your UTF-8 bytes. You should see if you can eliminate it, it's unnecessary and wreaks havoc on the universe.

dlongley commented 3 years ago

The other thing it could be is expressing byte strings using the unnecessarily verbose encoding for Uint8Arrays. JavaScript can only represent byte strings as Uint8Arrays, so there's little reason to include additional bytes every time a byte string is encoded in CBOR just because it came from a Uint8Array in JS in abstract form.

dlongley commented 3 years ago

If you need a good JS cbor library, we're transitioning to using this one: https://github.com/rvagg/cborg

dlongley commented 3 years ago

d840 == [216, 64]

216 in binary is 11011000 which, in CBOR, the first 3 bits are the major type, which would be 6 or a tagged type here, i.e., not a byte string which has major type 2. The remaining 5 bits are for the count (length of payload)/indicating an extended length ... or in the case of a tag (which your encoding is using) the size of the tag. The value is 24 here which indicates an extended count/tag size of 8 bits. The following 8 bits are the number 64, which is the tag number for a Uint8Array per https://tools.ietf.org/html/rfc8746.

So, it looks like your encoder is not using simple a byte string encoding for binary data, but including extra bytes to represent the fact that the data in your application happened to be stored using a Uint8Array (again, no other choice for binary data in JS land). Ideally, your encoder would be simplified to just report that the data is a string of bytes and save the precious space.

OR13 commented 3 years ago

@dlongley thank your for these comments and advice. I will look at switching to cborg, I have tried a bunch of different cbor implementations and they all feel to be lacking something.

I also chatted with @dmitrizagidulin about a set of test vectors for cbor-ld, structured loosely like this:

case-0.json case-0-context.json xsd-date-time case-1.cborld
case-1.json case-1-context.json base58-did-url case-2.cborld

This way we would always have a minimal example to test a given codec, of course we would still need interop tests for the large objects as well.

I will see if I can instruct https://github.com/hildjj/node-cbor/tree/main/packages/cbor-web to not encode the Uint8Array prefix... thanks again for your help.

dlongley commented 3 years ago

@OR13 -- we'll be releasing cborld 4.x very shortly. It has a complete rewrite such that it handles many more (most?) JSON-LD documents now. You'll want to take a look at what's going on ... and we'll need to get the spec updated and some good test vectors soon to enable interop testing. The new version is based on cborg, which, IMO, is a simpler cleaner implementation, brought to us by some of the same people that work on things in the IPFS ecosystem -- and it appears as though it plays nice in both node.js and web environments.

dmitrizagidulin commented 3 years ago

Looks to be addressed in the rewrite, closing.