Open rcurrie opened 7 years ago
Actually, this should be in the http://ipld.io/ spec,
The IPLD Canonical format is canonicalized CBOR with tags. The canonical CBOR format must follow rules defines in RFC 7049 section 3.9 in addition to the rules defined here. ...
@rcurrie I believe the line in question is https://github.com/ga4gh/cgtd/blob/f90a50672a2d3abf3132e8069f791e1a599432ae/cgtd/cgtd.py#L259 (whether json.dumps
does a sort on the keys, where it doesn't).
This is to be solved with either 1. (fastest for now?) jsonld.normalize
from https://github.com/digitalbazaar/pyld (to avoid confusion, the nomenclature 'normalization' in jsonld actually refers to 'canonicalization' https://github.com/json-ld/normalization/issues/2) or 2. build an ipld object directly (then serialize) or 3. check the solution devised by mediachain (see https://github.com/mediachain/aleph).
Ravi post GA4GH Vancouver suggest making double sure the JSON we are hashing is canonical. We currently sort the keys so the same data hashes to the same hash. But there still may be several implicit dependencies on how the underlying python generates JSON from a dictionary:
http://stackoverflow.com/questions/4670494/how-to-cryptographically-hash-a-json-object