Closed OR13 closed 3 years ago
@OR13, it sounds like your encoder may be injecting a byte order mark (BOM) before your UTF-8 bytes. You should see if you can eliminate it, it's unnecessary and wreaks havoc on the universe.
The other thing it could be is expressing byte strings using the unnecessarily verbose encoding for Uint8Arrays. JavaScript can only represent byte strings as Uint8Arrays, so there's little reason to include additional bytes every time a byte string is encoded in CBOR just because it came from a Uint8Array in JS in abstract form.
If you need a good JS cbor library, we're transitioning to using this one: https://github.com/rvagg/cborg
d840 == [216, 64]
216 in binary is 11011000
which, in CBOR, the first 3 bits are the major type, which would be 6
or a tagged type here, i.e., not a byte string which has major type 2. The remaining 5 bits are for the count (length of payload)/indicating an extended length ... or in the case of a tag (which your encoding is using) the size of the tag. The value is 24
here which indicates an extended count/tag size of 8 bits. The following 8 bits are the number 64, which is the tag number for a Uint8Array per https://tools.ietf.org/html/rfc8746.
So, it looks like your encoder is not using simple a byte string encoding for binary data, but including extra bytes to represent the fact that the data in your application happened to be stored using a Uint8Array (again, no other choice for binary data in JS land). Ideally, your encoder would be simplified to just report that the data is a string of bytes and save the precious space.
@dlongley thank your for these comments and advice. I will look at switching to cborg, I have tried a bunch of different cbor implementations and they all feel to be lacking something.
I also chatted with @dmitrizagidulin about a set of test vectors for cbor-ld, structured loosely like this:
case-0.json case-0-context.json xsd-date-time case-1.cborld
case-1.json case-1-context.json base58-did-url case-2.cborld
This way we would always have a minimal example to test a given codec, of course we would still need interop tests for the large objects as well.
I will see if I can instruct https://github.com/hildjj/node-cbor/tree/main/packages/cbor-web to not encode the Uint8Array prefix... thanks again for your help.
@OR13 -- we'll be releasing cborld 4.x very shortly. It has a complete rewrite such that it handles many more (most?) JSON-LD documents now. You'll want to take a look at what's going on ... and we'll need to get the spec updated and some good test vectors soon to enable interop testing. The new version is based on cborg, which, IMO, is a simpler cleaner implementation, brought to us by some of the same people that work on things in the IPFS ecosystem -- and it appears as though it plays nice in both node.js and web environments.
Looks to be addressed in the rewrite, closing.
First, let me say that I am beyond excited for CBOR-LD, and this bug that I am hunting is almost certainly related to my relative inexperience combined with my excitement.
in hex, all instances of the encoded “Base58DidURL” are prefixed with
d840
in my library but not in yours….as far as I can tell this happens during cbor encoding, not during encoding mapping stage, since the encoded values, are the same in both libraries…
for example:
I see the same encoded representation before the cbor encoding runs:
I worry that these changes may be the reason for this inconsistency:
https://github.com/digitalbazaar/cbor/blob/main/CHANGELOG.md