LibertyDSNP / spec

The DSNP Spec and Website
https://spec.dsnp.org
Other
30 stars 3 forks source link

DIP-210: Content Hashing Algorithm Changes #210

Closed wilwade closed 1 year ago

wilwade commented 1 year ago

In the spec, we have two places that are content hashes:

  1. Activity Content: Hash
  2. Announcement contentHash eg Broadcast Announcement

Abstract

Adding support to Announcement Content Hash for:

And moving to a Multihash structure

Motivation

Additional hashes allow for an easier integration on creation and support for a variety of platforms and languages, although it does increase the potential burden for consumers of the data who must support multiple hashing algorithms to be able to verify content using any hash.

Specification Pull Request

Current change pull request: #245

Rationale

We have always intended to support multiple hashes. The rational for each is different:

Chosen

Rejected for Now

Chosen Path: Multihash

Support for multihash instead of the prior v1.0 version. This brings everything into alignment with #210

Backwards Compatibility

This will break validation for all implementations. It is suggested that those who can, take advantage of using multiple algorithms for some time before using only one or more hashing algorithms in the new version.

Reference Implementation and/or Tests

None

Security Considerations

This change will allow for increased security as it decreases the likelihood that all hashing algorithms in use will be broken at the same time.

Dependencies

Additional Options

Options for Announcement contentHash

  1. Switch to Blake2b-256 instead of keccak
  2. Find some way to support both, likely with a prefix
  3. Stick with just keccak
  4. Switch to something else

References

Copyright

Copyright and related rights waived via CC0.

wilwade commented 1 year ago

Frequency // DSNP Spec 2022-10-13 Action items:

wesbiggs commented 1 year ago

You may want to consider PHC format, which seems to have evolved from crypt(3) notation.

The advantage is future-proofing for future algorithms and future use cases, including salted/parameterized algorithms. The disadvantage is verbosity (and from what I can tell, a lack of existing standardization for algorithm names).

In its simplest form: $keccak-256$$47173285a8d7341e5e972fc677286384f802f8ef42a5ec5f03bbfa254cb01fad

This forms a variable-length prefix of at least 4, and in this case, 13 characters.

wilwade commented 1 year ago

You may want to consider PHC format, which seems to have evolved from crypt(3) notation.

The advantage is future-proofing for future algorithms and future use cases, including salted/parameterized algorithms. The disadvantage is verbosity (and from what I can tell, a lack of existing standardization for algorithm names).

In its simplest form: $keccak-256$$47173285a8d7341e5e972fc677286384f802f8ef42a5ec5f03bbfa254cb01fad

This forms a variable-length prefix of at least 4, and in this case, 13 characters.

Interesting. We've tried really hard to keep the data in announcements as small as possible, although repeated data is fairly compressible.

wilwade commented 1 year ago

Frequency // DSNP Spec 2022-10-27 Notes:

wesbiggs commented 1 year ago

At the risk of unnecessarily prolonging this discussion, I'd like to consider multihash as well. This seems to have a certain amount of traction in the wider blockchain community.

keccak-256 hashes would be represented as 0x1b20{hash} (total of 34 bytes) blake2b-256 hashes would be represented as 0xa0e40220{hash} (total of 36 bytes)

Pros:

An additional consideration for this (as well as options C and D) is that prefix bytes should ideally be removed before using the hash value in a Bloom filter. In the case of multihash, the encoded value would need to be parsed so the prefix length could be determined. Edit: upon reflection, I don't see a use case to put individual hash values in a bloom filter, so this isn't particularly relevant.

wesbiggs commented 1 year ago

Multihash is also directly compatible with (is a component of) CIDs as used in IPFS.